library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.6     ✓ dplyr   1.0.8
✓ tidyr   1.2.0     ✓ stringr 1.4.0
✓ readr   2.0.2     ✓ forcats 0.5.1
── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library(lubridate)

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union
library(janitor)

Attaching package: ‘janitor’

The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test
library(broom)
library(modelr)

Attaching package: ‘modelr’

The following object is masked from ‘package:broom’:

    bootstrap
library(caret)
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     

Attaching package: ‘caret’

The following object is masked from ‘package:purrr’:

    lift
library(leaps)
library(GGally)
library(ggfortify)
raw_avocado <- read_csv("data/avocado.csv")
New names:
* `` -> ...1
Rows: 18249 Columns: 14
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr   (2): type, region
dbl  (11): ...1, AveragePrice, Total Volume, 4046, 4225, 4770, Total Bags, Small Bags, Large Bags, XLarge Bags, year
date  (1): Date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

MVP

We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary

We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!

As part of the MVP we want you not to just run the code but also have a go at interpreting the results and write your thinking in comments in your script.

Hints and tips

region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!) Think about whether each variable is categorical or numerical. If categorical, make sure that the variable is represented as a factor. We will not treat this data as a time series, so Date will not be needed in your models, but can you extract any useful features out of Date before you discard it? If you want to build a predictive model, consider using either leaps or glmulti to help with this.

Exploratory Data Analysis

summary(raw_avocado)
      ...1            Date             AveragePrice    Total Volume           4046               4225               4770        
 Min.   : 0.00   Min.   :2015-01-04   Min.   :0.440   Min.   :      85   Min.   :       0   Min.   :       0   Min.   :      0  
 1st Qu.:10.00   1st Qu.:2015-10-25   1st Qu.:1.100   1st Qu.:   10839   1st Qu.:     854   1st Qu.:    3009   1st Qu.:      0  
 Median :24.00   Median :2016-08-14   Median :1.370   Median :  107377   Median :    8645   Median :   29061   Median :    185  
 Mean   :24.23   Mean   :2016-08-13   Mean   :1.406   Mean   :  850644   Mean   :  293008   Mean   :  295155   Mean   :  22840  
 3rd Qu.:38.00   3rd Qu.:2017-06-04   3rd Qu.:1.660   3rd Qu.:  432962   3rd Qu.:  111020   3rd Qu.:  150207   3rd Qu.:   6243  
 Max.   :52.00   Max.   :2018-03-25   Max.   :3.250   Max.   :62505647   Max.   :22743616   Max.   :20470573   Max.   :2546439  
   Total Bags         Small Bags         Large Bags       XLarge Bags           type                year         region         
 Min.   :       0   Min.   :       0   Min.   :      0   Min.   :     0.0   Length:18249       Min.   :2015   Length:18249      
 1st Qu.:    5089   1st Qu.:    2849   1st Qu.:    127   1st Qu.:     0.0   Class :character   1st Qu.:2015   Class :character  
 Median :   39744   Median :   26363   Median :   2648   Median :     0.0   Mode  :character   Median :2016   Mode  :character  
 Mean   :  239639   Mean   :  182195   Mean   :  54338   Mean   :  3106.4                      Mean   :2016                     
 3rd Qu.:  110783   3rd Qu.:   83338   3rd Qu.:  22029   3rd Qu.:   132.5                      3rd Qu.:2017                     
 Max.   :19373134   Max.   :13384587   Max.   :5719097   Max.   :551693.7                      Max.   :2018                     

We have 18248 rows and 14 variables

  1. x1 - Row count - this can be removed
  2. Date - We will not treat this data as a time series, so Date will not be needed in your models, but can you extract any useful features out of Date before you discard it? (How about month?) runs from 2015-2018
  3. Ave Price - this is the value we will be modelling/predicting - ave price of a single avocado
  4. Total Volume - total number of avocadoes
  5. 4046: Small/Medium Hass Avocado
  6. 4225: Large Hass Avocado
  7. 4770: Extra Large Hass Avocado
  8. Total Bags
  9. Small Bags
  10. Large Bags
  11. XLarge Bags
  12. type: conventional or organic
  13. year: the year
  14. region: the city or region of the observation
# Clean Names
raw_avocado <- raw_avocado %>% 
clean_names()
# Fix the date field as it is not currently a date field
raw_avocado<- raw_avocado %>%
  mutate(date= ymd(date))
# Add in a month column
raw_avocado<- raw_avocado %>%
  mutate(month = month(date, label = TRUE, abbr = FALSE))
raw_avocado %>% 
  group_by(month) %>% 
  summarise(count=n())

Perhaps group the Months into quarters

# Add in a quarter column
raw_avocado<- raw_avocado %>%
  mutate(quarter = quarter(date))
# Box plot comparing type (conventional vs organic)
ggplot(raw_avocado, aes(x=as.factor(type), y=average_price)) + 
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("cyl")

So the organic avocadoes drive the price up

# Simple line graphs looking at some of the variables
ggplot(raw_avocado, aes(x=average_price)) + 
  geom_line(aes(y = x4225), color = "orange", alpha = 0.4) +
  geom_line(aes(y = x4046), color = "darkred", alpha = 0.4) +
  geom_line(aes(y = x4770), color="steelblue", alpha = 0.4) 

Doesn’t really tell us much - but we get an idea of the shape of the data.

regions <- raw_avocado %>% 
  group_by(region) %>% 
  summarise(count = n())

There are 54 regions, with the same number of observations from each. For modelling this could be a problem - but perhaps we can find one or two regions that are key for driving up prices.

Perhaps we should look at some simple stats per region.

regions <- raw_avocado %>% 
  group_by(region) %>% 
  summarise(count = n(), mean(average_price), mean(x4046), mean(x4225), 
            mean(x4770))
regions
raw_avocado %>%
  ggplot(aes(x = average_price, y = region)) +
  geom_boxplot()

Phew - what a mess

Let’s rotate it

raw_avocado %>%
  ggplot(aes(x = region, y = average_price)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45))

Ugly graph - but gives us a glimpse at the variation between regions - so perhaps this is important after all.

# Tidy up variables
# Remove row count, date and month
avocado_trim <- raw_avocado %>% 
  select(-c(x1, date, month))

Start Modelling

Check for aliased variable

alias(lm(average_price ~ ., data = avocado_trim))
Model :
average_price ~ total_volume + x4046 + x4225 + x4770 + total_bags + 
    small_bags + large_bags + x_large_bags + type + year + region + 
    quarter

Looks like we have no aliased variables - we are good to go

Run ggpairs

# This causes errors because of the regions
avocado_trim %>% 
GGally::ggpairs()
Error in stop_if_high_cardinality(data, columns, cardinality_threshold) : 
  Column 'region' has more levels (54) than the threshold (15) allowed.
Please remove the column or increase the 'cardinality_threshold' parameter. Increasing the cardinality_threshold may produce long processing times
# Let's see if it works if we convert to numeric/non-numeric
avocado_trim_numeric <- avocado_trim %>%
  select_if(is.numeric)

avocado_trim_nonnumeric <- avocado_trim %>%
  select_if(function(x) !is.numeric(x))

avocado_trim_nonnumeric$price <- avocado_trim$price
Warning: Unknown or uninitialised column: `price`.
ggpairs(avocado_trim_numeric)

 plot: [1,1] [>--------------------------------------------------------------------------------------------------------------------]  1% est: 0s 
 plot: [1,2] [=>-------------------------------------------------------------------------------------------------------------------]  2% est: 8s 
 plot: [1,3] [==>------------------------------------------------------------------------------------------------------------------]  2% est: 8s 
 plot: [1,4] [===>-----------------------------------------------------------------------------------------------------------------]  3% est: 8s 
 plot: [1,5] [====>----------------------------------------------------------------------------------------------------------------]  4% est: 8s 
 plot: [1,6] [=====>---------------------------------------------------------------------------------------------------------------]  5% est: 8s 
 plot: [1,7] [======>--------------------------------------------------------------------------------------------------------------]  6% est: 9s 
 plot: [1,8] [=======>-------------------------------------------------------------------------------------------------------------]  7% est: 9s 
 plot: [1,9] [========>------------------------------------------------------------------------------------------------------------]  7% est: 9s 
 plot: [1,10] [=========>----------------------------------------------------------------------------------------------------------]  8% est: 9s 
 plot: [1,11] [==========>---------------------------------------------------------------------------------------------------------]  9% est: 9s 
 plot: [2,1] [===========>---------------------------------------------------------------------------------------------------------] 10% est: 8s 
 plot: [2,2] [============>--------------------------------------------------------------------------------------------------------] 11% est: 8s 
 plot: [2,3] [=============>-------------------------------------------------------------------------------------------------------] 12% est: 9s 
 plot: [2,4] [==============>------------------------------------------------------------------------------------------------------] 12% est: 9s 
 plot: [2,5] [==============>------------------------------------------------------------------------------------------------------] 13% est: 9s 
 plot: [2,6] [===============>-----------------------------------------------------------------------------------------------------] 14% est: 9s 
 plot: [2,7] [================>----------------------------------------------------------------------------------------------------] 15% est: 8s 
 plot: [2,8] [=================>---------------------------------------------------------------------------------------------------] 16% est: 8s 
 plot: [2,9] [==================>--------------------------------------------------------------------------------------------------] 17% est: 8s 
 plot: [2,10] [===================>------------------------------------------------------------------------------------------------] 17% est: 8s 
 plot: [2,11] [====================>-----------------------------------------------------------------------------------------------] 18% est: 8s 
 plot: [3,1] [=====================>-----------------------------------------------------------------------------------------------] 19% est: 8s 
 plot: [3,2] [======================>----------------------------------------------------------------------------------------------] 20% est: 8s 
 plot: [3,3] [=======================>---------------------------------------------------------------------------------------------] 21% est: 7s 
 plot: [3,4] [========================>--------------------------------------------------------------------------------------------] 21% est: 7s 
 plot: [3,5] [=========================>-------------------------------------------------------------------------------------------] 22% est: 7s 
 plot: [3,6] [==========================>------------------------------------------------------------------------------------------] 23% est: 7s 
 plot: [3,7] [===========================>-----------------------------------------------------------------------------------------] 24% est: 7s 
 plot: [3,8] [============================>----------------------------------------------------------------------------------------] 25% est: 7s 
 plot: [3,9] [=============================>---------------------------------------------------------------------------------------] 26% est: 7s 
 plot: [3,10] [==============================>-------------------------------------------------------------------------------------] 26% est: 7s 
 plot: [3,11] [===============================>------------------------------------------------------------------------------------] 27% est: 7s 
 plot: [4,1] [================================>------------------------------------------------------------------------------------] 28% est: 6s 
 plot: [4,2] [=================================>-----------------------------------------------------------------------------------] 29% est: 6s 
 plot: [4,3] [==================================>----------------------------------------------------------------------------------] 30% est: 6s 
 plot: [4,4] [===================================>---------------------------------------------------------------------------------] 31% est: 7s 
 plot: [4,5] [====================================>--------------------------------------------------------------------------------] 31% est: 6s 
 plot: [4,6] [=====================================>-------------------------------------------------------------------------------] 32% est: 6s 
 plot: [4,7] [======================================>------------------------------------------------------------------------------] 33% est: 6s 
 plot: [4,8] [=======================================>-----------------------------------------------------------------------------] 34% est: 6s 
 plot: [4,9] [========================================>----------------------------------------------------------------------------] 35% est: 6s 
 plot: [4,10] [========================================>---------------------------------------------------------------------------] 36% est: 6s 
 plot: [4,11] [=========================================>--------------------------------------------------------------------------] 36% est: 6s 
 plot: [5,1] [===========================================>-------------------------------------------------------------------------] 37% est: 6s 
 plot: [5,2] [===========================================>-------------------------------------------------------------------------] 38% est: 6s 
 plot: [5,3] [============================================>------------------------------------------------------------------------] 39% est: 6s 
 plot: [5,4] [=============================================>-----------------------------------------------------------------------] 40% est: 6s 
 plot: [5,5] [==============================================>----------------------------------------------------------------------] 40% est: 6s 
 plot: [5,6] [===============================================>---------------------------------------------------------------------] 41% est: 5s 
 plot: [5,7] [================================================>--------------------------------------------------------------------] 42% est: 5s 
 plot: [5,8] [=================================================>-------------------------------------------------------------------] 43% est: 5s 
 plot: [5,9] [==================================================>------------------------------------------------------------------] 44% est: 5s 
 plot: [5,10] [===================================================>----------------------------------------------------------------] 45% est: 5s 
 plot: [5,11] [====================================================>---------------------------------------------------------------] 45% est: 5s 
 plot: [6,1] [=====================================================>---------------------------------------------------------------] 46% est: 5s 
 plot: [6,2] [======================================================>--------------------------------------------------------------] 47% est: 5s 
 plot: [6,3] [=======================================================>-------------------------------------------------------------] 48% est: 5s 
 plot: [6,4] [========================================================>------------------------------------------------------------] 49% est: 5s 
 plot: [6,5] [=========================================================>-----------------------------------------------------------] 50% est: 5s 
 plot: [6,6] [==========================================================>----------------------------------------------------------] 50% est: 5s 
 plot: [6,7] [===========================================================>---------------------------------------------------------] 51% est: 5s 
 plot: [6,8] [============================================================>--------------------------------------------------------] 52% est: 4s 
 plot: [6,9] [=============================================================>-------------------------------------------------------] 53% est: 4s 
 plot: [6,10] [=============================================================>------------------------------------------------------] 54% est: 4s 
 plot: [6,11] [==============================================================>-----------------------------------------------------] 55% est: 4s 
 plot: [7,1] [================================================================>----------------------------------------------------] 55% est: 4s 
 plot: [7,2] [=================================================================>---------------------------------------------------] 56% est: 4s 
 plot: [7,3] [==================================================================>--------------------------------------------------] 57% est: 4s 
 plot: [7,4] [===================================================================>-------------------------------------------------] 58% est: 4s 
 plot: [7,5] [====================================================================>------------------------------------------------] 59% est: 4s 
 plot: [7,6] [=====================================================================>-----------------------------------------------] 60% est: 4s 
 plot: [7,7] [======================================================================>----------------------------------------------] 60% est: 4s 
 plot: [7,8] [=======================================================================>---------------------------------------------] 61% est: 4s 
 plot: [7,9] [========================================================================>--------------------------------------------] 62% est: 4s 
 plot: [7,10] [========================================================================>-------------------------------------------] 63% est: 3s 
 plot: [7,11] [=========================================================================>------------------------------------------] 64% est: 3s 
 plot: [8,1] [==========================================================================>------------------------------------------] 64% est: 3s 
 plot: [8,2] [===========================================================================>-----------------------------------------] 65% est: 3s 
 plot: [8,3] [============================================================================>----------------------------------------] 66% est: 3s 
 plot: [8,4] [=============================================================================>---------------------------------------] 67% est: 3s 
 plot: [8,5] [==============================================================================>--------------------------------------] 68% est: 3s 
 plot: [8,6] [===============================================================================>-------------------------------------] 69% est: 3s 
 plot: [8,7] [================================================================================>------------------------------------] 69% est: 3s 
 plot: [8,8] [=================================================================================>-----------------------------------] 70% est: 3s 
 plot: [8,9] [==================================================================================>----------------------------------] 71% est: 3s 
 plot: [8,10] [==================================================================================>---------------------------------] 72% est: 3s 
 plot: [8,11] [===================================================================================>--------------------------------] 73% est: 3s 
 plot: [9,1] [=====================================================================================>-------------------------------] 74% est: 3s 
 plot: [9,2] [======================================================================================>------------------------------] 74% est: 3s 
 plot: [9,3] [=======================================================================================>-----------------------------] 75% est: 2s 
 plot: [9,4] [========================================================================================>----------------------------] 76% est: 2s 
 plot: [9,5] [=========================================================================================>---------------------------] 77% est: 2s 
 plot: [9,6] [==========================================================================================>--------------------------] 78% est: 2s 
 plot: [9,7] [===========================================================================================>-------------------------] 79% est: 2s 
 plot: [9,8] [============================================================================================>------------------------] 79% est: 2s 
 plot: [9,9] [=============================================================================================>-----------------------] 80% est: 2s 
 plot: [9,10] [=============================================================================================>----------------------] 81% est: 2s 
 plot: [9,11] [==============================================================================================>---------------------] 82% est: 2s 
 plot: [10,1] [===============================================================================================>--------------------] 83% est: 2s 
 plot: [10,2] [================================================================================================>-------------------] 83% est: 2s 
 plot: [10,3] [=================================================================================================>------------------] 84% est: 2s 
 plot: [10,4] [==================================================================================================>-----------------] 85% est: 1s 
 plot: [10,5] [===================================================================================================>----------------] 86% est: 1s 
 plot: [10,6] [====================================================================================================>---------------] 87% est: 1s 
 plot: [10,7] [=====================================================================================================>--------------] 88% est: 1s 
 plot: [10,8] [======================================================================================================>-------------] 88% est: 1s 
 plot: [10,9] [=======================================================================================================>------------] 89% est: 1s 
 plot: [10,10] [=======================================================================================================>-----------] 90% est: 1s 
 plot: [10,11] [========================================================================================================>----------] 91% est: 1s 
 plot: [11,1] [=========================================================================================================>----------] 92% est: 1s 
 plot: [11,2] [==========================================================================================================>---------] 93% est: 1s 
 plot: [11,3] [===========================================================================================================>--------] 93% est: 1s 
 plot: [11,4] [============================================================================================================>-------] 94% est: 1s 
 plot: [11,5] [=============================================================================================================>------] 95% est: 0s 
 plot: [11,6] [==============================================================================================================>-----] 96% est: 0s 
 plot: [11,7] [===============================================================================================================>----] 97% est: 0s 
 plot: [11,8] [================================================================================================================>---] 98% est: 0s 
 plot: [11,9] [=================================================================================================================>--] 98% est: 0s 
 plot: [11,10] [=================================================================================================================>-] 99% est: 0s 
 plot: [11,11] [===================================================================================================================]100% est: 0s 
                                                                                                                                                 

ggpairs(avocado_trim_nonnumeric)
Error in stop_if_high_cardinality(data, columns, cardinality_threshold) : 
  Column 'region' has more levels (54) than the threshold (15) allowed.
Please remove the column or increase the 'cardinality_threshold' parameter. Increasing the cardinality_threshold may produce long processing times

So - some observations: Regions continue to cause problems - so need to rethink it. The quarters are being recognised as numeric, not categories - so need to recode

Recode problem data

avocado_trim <- avocado_trim %>% 
  mutate(quarter = str_c("Q", quarter))
# Remove regions
avocado_trim_nr <- avocado_trim %>% 
  select(-c(region))
# Attempt two
avocado_trim_numeric <- avocado_trim_nr %>%
  select_if(is.numeric)

avocado_trim_nonnumeric <- avocado_trim_nr %>%
  select_if(function(x) !is.numeric(x))

avocado_trim_nonnumeric$average_price <- avocado_trim_nr$average_price

ggpairs(avocado_trim_numeric)

 plot: [1,1] [>---------------------------------------------------------]  1% est: 0s 
 plot: [1,2] [>---------------------------------------------------------]  2% est: 4s 
 plot: [1,3] [=>--------------------------------------------------------]  3% est: 5s 
 plot: [1,4] [=>--------------------------------------------------------]  4% est: 5s 
 plot: [1,5] [==>-------------------------------------------------------]  5% est: 5s 
 plot: [1,6] [==>-------------------------------------------------------]  6% est: 5s 
 plot: [1,7] [===>------------------------------------------------------]  7% est: 5s 
 plot: [1,8] [====>-----------------------------------------------------]  8% est: 5s 
 plot: [1,9] [====>-----------------------------------------------------]  9% est: 5s 
 plot: [1,10] [=====>---------------------------------------------------] 10% est: 5s 
 plot: [2,1] [=====>----------------------------------------------------] 11% est: 5s 
 plot: [2,2] [======>---------------------------------------------------] 12% est: 5s 
 plot: [2,3] [=======>--------------------------------------------------] 13% est: 5s 
 plot: [2,4] [=======>--------------------------------------------------] 14% est: 5s 
 plot: [2,5] [========>-------------------------------------------------] 15% est: 5s 
 plot: [2,6] [========>-------------------------------------------------] 16% est: 5s 
 plot: [2,7] [=========>------------------------------------------------] 17% est: 4s 
 plot: [2,8] [=========>------------------------------------------------] 18% est: 4s 
 plot: [2,9] [==========>-----------------------------------------------] 19% est: 4s 
 plot: [2,10] [==========>----------------------------------------------] 20% est: 4s 
 plot: [3,1] [===========>----------------------------------------------] 21% est: 4s 
 plot: [3,2] [============>---------------------------------------------] 22% est: 4s 
 plot: [3,3] [============>---------------------------------------------] 23% est: 4s 
 plot: [3,4] [=============>--------------------------------------------] 24% est: 4s 
 plot: [3,5] [=============>--------------------------------------------] 25% est: 4s 
 plot: [3,6] [==============>-------------------------------------------] 26% est: 4s 
 plot: [3,7] [===============>------------------------------------------] 27% est: 4s 
 plot: [3,8] [===============>------------------------------------------] 28% est: 4s 
 plot: [3,9] [================>-----------------------------------------] 29% est: 4s 
 plot: [3,10] [================>----------------------------------------] 30% est: 4s 
 plot: [4,1] [=================>----------------------------------------] 31% est: 4s 
 plot: [4,2] [==================>---------------------------------------] 32% est: 4s 
 plot: [4,3] [==================>---------------------------------------] 33% est: 4s 
 plot: [4,4] [===================>--------------------------------------] 34% est: 4s 
 plot: [4,5] [===================>--------------------------------------] 35% est: 4s 
 plot: [4,6] [====================>-------------------------------------] 36% est: 4s 
 plot: [4,7] [====================>-------------------------------------] 37% est: 4s 
 plot: [4,8] [=====================>------------------------------------] 38% est: 4s 
 plot: [4,9] [======================>-----------------------------------] 39% est: 3s 
 plot: [4,10] [======================>----------------------------------] 40% est: 3s 
 plot: [5,1] [=======================>----------------------------------] 41% est: 3s 
 plot: [5,2] [=======================>----------------------------------] 42% est: 3s 
 plot: [5,3] [========================>---------------------------------] 43% est: 3s 
 plot: [5,4] [=========================>--------------------------------] 44% est: 3s 
 plot: [5,5] [=========================>--------------------------------] 45% est: 3s 
 plot: [5,6] [==========================>-------------------------------] 46% est: 3s 
 plot: [5,7] [==========================>-------------------------------] 47% est: 3s 
 plot: [5,8] [===========================>------------------------------] 48% est: 3s 
 plot: [5,9] [===========================>------------------------------] 49% est: 3s 
 plot: [5,10] [===========================>-----------------------------] 50% est: 3s 
 plot: [6,1] [=============================>----------------------------] 51% est: 3s 
 plot: [6,2] [=============================>----------------------------] 52% est: 3s 
 plot: [6,3] [==============================>---------------------------] 53% est: 3s 
 plot: [6,4] [==============================>---------------------------] 54% est: 3s 
 plot: [6,5] [===============================>--------------------------] 55% est: 3s 
 plot: [6,6] [===============================>--------------------------] 56% est: 3s 
 plot: [6,7] [================================>-------------------------] 57% est: 2s 
 plot: [6,8] [=================================>------------------------] 58% est: 2s 
 plot: [6,9] [=================================>------------------------] 59% est: 2s 
 plot: [6,10] [=================================>-----------------------] 60% est: 2s 
 plot: [7,1] [==================================>-----------------------] 61% est: 2s 
 plot: [7,2] [===================================>----------------------] 62% est: 2s 
 plot: [7,3] [====================================>---------------------] 63% est: 2s 
 plot: [7,4] [====================================>---------------------] 64% est: 2s 
 plot: [7,5] [=====================================>--------------------] 65% est: 2s 
 plot: [7,6] [=====================================>--------------------] 66% est: 2s 
 plot: [7,7] [======================================>-------------------] 67% est: 2s 
 plot: [7,8] [======================================>-------------------] 68% est: 2s 
 plot: [7,9] [=======================================>------------------] 69% est: 2s 
 plot: [7,10] [=======================================>-----------------] 70% est: 2s 
 plot: [8,1] [========================================>-----------------] 71% est: 2s 
 plot: [8,2] [=========================================>----------------] 72% est: 2s 
 plot: [8,3] [=========================================>----------------] 73% est: 2s 
 plot: [8,4] [==========================================>---------------] 74% est: 2s 
 plot: [8,5] [===========================================>--------------] 75% est: 1s 
 plot: [8,6] [===========================================>--------------] 76% est: 1s 
 plot: [8,7] [============================================>-------------] 77% est: 1s 
 plot: [8,8] [============================================>-------------] 78% est: 1s 
 plot: [8,9] [=============================================>------------] 79% est: 1s 
 plot: [8,10] [=============================================>-----------] 80% est: 1s 
 plot: [9,1] [==============================================>-----------] 81% est: 1s 
 plot: [9,2] [===============================================>----------] 82% est: 1s 
 plot: [9,3] [===============================================>----------] 83% est: 1s 
 plot: [9,4] [================================================>---------] 84% est: 1s 
 plot: [9,5] [================================================>---------] 85% est: 1s 
 plot: [9,6] [=================================================>--------] 86% est: 1s 
 plot: [9,7] [=================================================>--------] 87% est: 1s 
 plot: [9,8] [==================================================>-------] 88% est: 1s 
 plot: [9,9] [===================================================>------] 89% est: 1s 
 plot: [9,10] [==================================================>------] 90% est: 1s 
 plot: [10,1] [===================================================>-----] 91% est: 1s 
 plot: [10,2] [===================================================>-----] 92% est: 0s 
 plot: [10,3] [====================================================>----] 93% est: 0s 
 plot: [10,4] [=====================================================>---] 94% est: 0s 
 plot: [10,5] [=====================================================>---] 95% est: 0s 
 plot: [10,6] [======================================================>--] 96% est: 0s 
 plot: [10,7] [======================================================>--] 97% est: 0s 
 plot: [10,8] [=======================================================>-] 98% est: 0s 
 plot: [10,9] [=======================================================>-] 99% est: 0s 
 plot: [10,10] [========================================================]100% est: 0s 
                                                                                      

ggpairs(avocado_trim_nonnumeric)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Non-numeric Type is definitely a key variable Quarter has some influence

Numeric correlations (of average price) Year 0.093 xl bags -0.118 large bags -0.173 x4225 -0.173 small bags -0.175 total bags -0.177 x4770 -0.179 total volume -0.193 x4046 -0.208

The highest correlation scores (top three) x4046 -0.208 total volume -0.193 x4770 -0.179

Try exhaustive modelling

to identify key variables

# exhaustive selection
regsubsets_exhaustive <- regsubsets(average_price ~ ., 
                                 data = avocado_trim_nr, 
                                 nvmax =8, # maxm size of subsets
                                 method = "exhaustive")
sum_regsubsets_exhaustive <- summary(regsubsets_exhaustive)
sum_regsubsets_exhaustive
Subset selection object
Call: regsubsets.formula(average_price ~ ., data = avocado_trim_nr, 
    nvmax = 8, method = "exhaustive")
13 Variables  (and intercept)
             Forced in Forced out
total_volume     FALSE      FALSE
x4046            FALSE      FALSE
x4225            FALSE      FALSE
x4770            FALSE      FALSE
total_bags       FALSE      FALSE
small_bags       FALSE      FALSE
large_bags       FALSE      FALSE
x_large_bags     FALSE      FALSE
typeorganic      FALSE      FALSE
year             FALSE      FALSE
quarterQ2        FALSE      FALSE
quarterQ3        FALSE      FALSE
quarterQ4        FALSE      FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
         total_volume x4046 x4225 x4770 total_bags small_bags large_bags x_large_bags
1  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
2  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
3  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
4  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
5  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
6  ( 1 ) " "          "*"   "*"   " "   " "        " "        " "        " "         
7  ( 1 ) " "          "*"   "*"   " "   " "        " "        " "        " "         
8  ( 1 ) "*"          " "   "*"   " "   " "        "*"        " "        " "         
         typeorganic year quarterQ2 quarterQ3 quarterQ4
1  ( 1 ) "*"         " "  " "       " "       " "      
2  ( 1 ) "*"         " "  " "       "*"       " "      
3  ( 1 ) "*"         " "  " "       "*"       "*"      
4  ( 1 ) "*"         "*"  " "       "*"       "*"      
5  ( 1 ) "*"         "*"  "*"       "*"       "*"      
6  ( 1 ) "*"         "*"  " "       "*"       "*"      
7  ( 1 ) "*"         "*"  "*"       "*"       "*"      
8  ( 1 ) "*"         "*"  "*"       "*"       "*"      
sum_regsubsets_exhaustive$which
  (Intercept) total_volume x4046 x4225 x4770 total_bags small_bags large_bags
1        TRUE        FALSE FALSE FALSE FALSE      FALSE      FALSE      FALSE
2        TRUE        FALSE FALSE FALSE FALSE      FALSE      FALSE      FALSE
3        TRUE        FALSE FALSE FALSE FALSE      FALSE      FALSE      FALSE
4        TRUE        FALSE FALSE FALSE FALSE      FALSE      FALSE      FALSE
5        TRUE        FALSE FALSE FALSE FALSE      FALSE      FALSE      FALSE
6        TRUE        FALSE  TRUE  TRUE FALSE      FALSE      FALSE      FALSE
7        TRUE        FALSE  TRUE  TRUE FALSE      FALSE      FALSE      FALSE
8        TRUE         TRUE FALSE  TRUE FALSE      FALSE       TRUE      FALSE
  x_large_bags typeorganic  year quarterQ2 quarterQ3 quarterQ4
1        FALSE        TRUE FALSE     FALSE     FALSE     FALSE
2        FALSE        TRUE FALSE     FALSE      TRUE     FALSE
3        FALSE        TRUE FALSE     FALSE      TRUE      TRUE
4        FALSE        TRUE  TRUE     FALSE      TRUE      TRUE
5        FALSE        TRUE  TRUE      TRUE      TRUE      TRUE
6        FALSE        TRUE  TRUE     FALSE      TRUE      TRUE
7        FALSE        TRUE  TRUE      TRUE      TRUE      TRUE
8        FALSE        TRUE  TRUE      TRUE      TRUE      TRUE
plot(regsubsets_exhaustive, scale = "adjr2")

plot(regsubsets_exhaustive, scale = "bic")

plot(sum_regsubsets_exhaustive$rsq, type = "b")

Interestingly there is no elbow in the plot so there is no clear point at which to stop modelling.

plot(sum_regsubsets_exhaustive$bic, type = "b")

summary(regsubsets_exhaustive)$which[6,]
 (Intercept) total_volume        x4046        x4225        x4770   total_bags 
        TRUE        FALSE         TRUE         TRUE        FALSE        FALSE 
  small_bags   large_bags x_large_bags  typeorganic         year    quarterQ2 
       FALSE        FALSE        FALSE         TRUE         TRUE        FALSE 
   quarterQ3    quarterQ4 
        TRUE         TRUE 

Exhausting modelling suggests to us that the key variables (in order) are: type (organic) quarter (03) quarter(04) year quarter(02)

First Variable selection

Model 1a - type

Average Price is our predicted value

Average price = 1.158 + (0.496 x Organic(type))

If an avocado is organic the price of it will increase by 0.496 assuming all other variables remain constant.

The p-value is less than 0.05 so we know this is statistically significant. The R^2 value tells us that 37.9% of the variation in the average price can be accounted by the avocado being organic.

Before we accept this as our first variable let’s check with our second predictor - quarter 3

Model 1b - quarter

Average Price is our predicted value

Average price = 1.30660 + (0.20631 x Organic(type))

If an avocado is organic the price of it will increase by 0.496 assuming all other variables remain constant.

The p-value is less than 0.05 so we know this is statistically significant. The R^2 value tells us that 4% of the variation in the average price can be accounted by the avocado being organic.

Model1a is definitely a better model than Model1b - so let’s choose type for the first variable.

Second Variable selection

Now we need to rerun the analysis to determine the next variable

avocado_rem_resid <- avocado_trim_nr %>%
  add_residuals(model1a) %>%
  select(-c("average_price", "type"))
ggpairs(avocado_rem_resid)

 plot: [1,1] [----------------------------------------------------------]  1% est: 0s 
 plot: [1,2] [>---------------------------------------------------------]  2% est: 4s 
 plot: [1,3] [>---------------------------------------------------------]  2% est: 5s 
 plot: [1,4] [=>--------------------------------------------------------]  3% est: 7s 
 plot: [1,5] [=>--------------------------------------------------------]  4% est: 7s 
 plot: [1,6] [==>-------------------------------------------------------]  5% est: 7s 
 plot: [1,7] [==>-------------------------------------------------------]  6% est: 7s 
 plot: [1,8] [===>------------------------------------------------------]  7% est: 6s 
 plot: [1,9] [===>------------------------------------------------------]  7% est: 6s 
 plot: [1,10] [====>----------------------------------------------------]  8% est: 6s 
 plot: [1,11] [====>----------------------------------------------------]  9% est: 6s 
 plot: [2,1] [=====>----------------------------------------------------] 10% est: 7s 
 plot: [2,2] [=====>----------------------------------------------------] 11% est: 7s 
 plot: [2,3] [======>---------------------------------------------------] 12% est: 7s 
 plot: [2,4] [======>---------------------------------------------------] 12% est: 7s 
 plot: [2,5] [=======>--------------------------------------------------] 13% est: 6s 
 plot: [2,6] [=======>--------------------------------------------------] 14% est: 6s 
 plot: [2,7] [========>-------------------------------------------------] 15% est: 6s 
 plot: [2,8] [========>-------------------------------------------------] 16% est: 6s 
 plot: [2,9] [=========>------------------------------------------------] 17% est: 6s 
 plot: [2,10] [=========>-----------------------------------------------] 17% est: 6s 
 plot: [2,11] [=========>-----------------------------------------------] 18% est: 6s 
 plot: [3,1] [==========>-----------------------------------------------] 19% est: 6s 
 plot: [3,2] [===========>----------------------------------------------] 20% est: 6s 
 plot: [3,3] [===========>----------------------------------------------] 21% est: 6s 
 plot: [3,4] [===========>----------------------------------------------] 21% est: 6s 
 plot: [3,5] [============>---------------------------------------------] 22% est: 6s 
 plot: [3,6] [============>---------------------------------------------] 23% est: 6s 
 plot: [3,7] [=============>--------------------------------------------] 24% est: 6s 
 plot: [3,8] [=============>--------------------------------------------] 25% est: 6s 
 plot: [3,9] [==============>-------------------------------------------] 26% est: 6s 
 plot: [3,10] [==============>------------------------------------------] 26% est: 6s 
 plot: [3,11] [===============>-----------------------------------------] 27% est: 6s 
 plot: [4,1] [===============>------------------------------------------] 28% est: 6s 
 plot: [4,2] [================>-----------------------------------------] 29% est: 6s 
 plot: [4,3] [================>-----------------------------------------] 30% est: 5s 
 plot: [4,4] [=================>----------------------------------------] 31% est: 5s 
 plot: [4,5] [=================>----------------------------------------] 31% est: 5s 
 plot: [4,6] [==================>---------------------------------------] 32% est: 5s 
 plot: [4,7] [==================>---------------------------------------] 33% est: 5s 
 plot: [4,8] [===================>--------------------------------------] 34% est: 5s 
 plot: [4,9] [===================>--------------------------------------] 35% est: 5s 
 plot: [4,10] [===================>-------------------------------------] 36% est: 5s 
 plot: [4,11] [====================>------------------------------------] 36% est: 5s 
 plot: [5,1] [=====================>------------------------------------] 37% est: 5s 
 plot: [5,2] [=====================>------------------------------------] 38% est: 5s 
 plot: [5,3] [======================>-----------------------------------] 39% est: 5s 
 plot: [5,4] [======================>-----------------------------------] 40% est: 5s 
 plot: [5,5] [======================>-----------------------------------] 40% est: 5s 
 plot: [5,6] [=======================>----------------------------------] 41% est: 5s 
 plot: [5,7] [=======================>----------------------------------] 42% est: 5s 
 plot: [5,8] [========================>---------------------------------] 43% est: 4s 
 plot: [5,9] [========================>---------------------------------] 44% est: 4s 
 plot: [5,10] [========================>--------------------------------] 45% est: 4s 
 plot: [5,11] [=========================>-------------------------------] 45% est: 4s 
 plot: [6,1] [==========================>-------------------------------] 46% est: 4s 
 plot: [6,2] [==========================>-------------------------------] 47% est: 4s 
 plot: [6,3] [===========================>------------------------------] 48% est: 4s 
 plot: [6,4] [===========================>------------------------------] 49% est: 4s 
 plot: [6,5] [============================>-----------------------------] 50% est: 4s 
 plot: [6,6] [============================>-----------------------------] 50% est: 4s 
 plot: [6,7] [=============================>----------------------------] 51% est: 4s 
 plot: [6,8] [=============================>----------------------------] 52% est: 4s 
 plot: [6,9] [==============================>---------------------------] 53% est: 4s 
 plot: [6,10] [==============================>--------------------------] 54% est: 4s 
 plot: [6,11] [==============================>--------------------------] 55% est: 4s 
 plot: [7,1] [===============================>--------------------------] 55% est: 4s 
 plot: [7,2] [================================>-------------------------] 56% est: 3s 
 plot: [7,3] [================================>-------------------------] 57% est: 3s 
 plot: [7,4] [=================================>------------------------] 58% est: 3s 
 plot: [7,5] [=================================>------------------------] 59% est: 3s 
 plot: [7,6] [==================================>-----------------------] 60% est: 3s 
 plot: [7,7] [==================================>-----------------------] 60% est: 3s 
 plot: [7,8] [==================================>-----------------------] 61% est: 3s 
 plot: [7,9] [===================================>----------------------] 62% est: 3s 
 plot: [7,10] [===================================>---------------------] 63% est: 3s 
 plot: [7,11] [===================================>---------------------] 64% est: 3s 
 plot: [8,1] [====================================>---------------------] 64% est: 3s 
 plot: [8,2] [=====================================>--------------------] 65% est: 3s 
 plot: [8,3] [=====================================>--------------------] 66% est: 3s 
 plot: [8,4] [======================================>-------------------] 67% est: 3s 
 plot: [8,5] [======================================>-------------------] 68% est: 3s 
 plot: [8,6] [=======================================>------------------] 69% est: 3s 
 plot: [8,7] [=======================================>------------------] 69% est: 2s 
 plot: [8,8] [========================================>-----------------] 70% est: 2s 
 plot: [8,9] [========================================>-----------------] 71% est: 2s 
 plot: [8,10] [========================================>----------------] 72% est: 2s 
 plot: [8,11] [========================================>----------------] 73% est: 2s 
 plot: [9,1] [==========================================>---------------] 74% est: 2s 
 plot: [9,2] [==========================================>---------------] 74% est: 2s 
 plot: [9,3] [===========================================>--------------] 75% est: 2s 
 plot: [9,4] [===========================================>--------------] 76% est: 2s 
 plot: [9,5] [============================================>-------------] 77% est: 2s 
 plot: [9,6] [============================================>-------------] 78% est: 2s 
 plot: [9,7] [=============================================>------------] 79% est: 2s 
 plot: [9,8] [=============================================>------------] 79% est: 2s 
 plot: [9,9] [=============================================>------------] 80% est: 2s 
 plot: [9,10] [=============================================>-----------] 81% est: 2s 
 plot: [9,11] [==============================================>----------] 82% est: 1s 
 plot: [10,1] [==============================================>----------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,2] [===============================================>---------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,3] [===============================================>---------] 84% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,4] [================================================>--------] 85% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,5] [================================================>--------] 86% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,6] [================================================>--------] 87% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,7] [=================================================>-------] 88% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,8] [=================================================>-------] 88% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,9] [==================================================>------] 89% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,10] [=================================================>------] 90% est: 1s 
 plot: [10,11] [==================================================>-----] 91% est: 1s 
 plot: [11,1] [===================================================>-----] 92% est: 1s 
 plot: [11,2] [====================================================>----] 93% est: 1s 
 plot: [11,3] [====================================================>----] 93% est: 1s 
 plot: [11,4] [=====================================================>---] 94% est: 1s 
 plot: [11,5] [=====================================================>---] 95% est: 0s 
 plot: [11,6] [======================================================>--] 96% est: 0s 
 plot: [11,7] [======================================================>--] 97% est: 0s 
 plot: [11,8] [=======================================================>-] 98% est: 0s 
 plot: [11,9] [=======================================================>-] 98% est: 0s 
 plot: [11,10] [=======================================================>] 99% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [11,11] [========================================================]100% est: 0s 
                                                                                      

Coefficients (of the Resid) total volume -0.063 x4046 -0.088 x4225 -0.038 x4770 -0.064 total bags -0.055 small bags -0.049 large bags -0.069 xl bags -0.012 year 0.118 quarter - some variation - Do I need to make this a dummy?

# exhaustive selection
regsubsets_exhaustive2 <- regsubsets(resid ~ ., 
                                 data = avocado_rem_resid, 
                                 nvmax =8, # maxm size of subsets
                                 method = "exhaustive")
sum_regsubsets_exhaustive2 <- summary(regsubsets_exhaustive2)
sum_regsubsets_exhaustive2
Subset selection object
Call: regsubsets.formula(resid ~ ., data = avocado_rem_resid, nvmax = 8, 
    method = "exhaustive")
12 Variables  (and intercept)
             Forced in Forced out
total_volume     FALSE      FALSE
x4046            FALSE      FALSE
x4225            FALSE      FALSE
x4770            FALSE      FALSE
total_bags       FALSE      FALSE
small_bags       FALSE      FALSE
large_bags       FALSE      FALSE
x_large_bags     FALSE      FALSE
year             FALSE      FALSE
quarterQ2        FALSE      FALSE
quarterQ3        FALSE      FALSE
quarterQ4        FALSE      FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
         total_volume x4046 x4225 x4770 total_bags small_bags large_bags x_large_bags
1  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
2  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
3  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
4  ( 1 ) " "          " "   " "   " "   " "        " "        " "        " "         
5  ( 1 ) " "          "*"   "*"   " "   " "        " "        " "        " "         
6  ( 1 ) " "          "*"   "*"   " "   " "        " "        " "        " "         
7  ( 1 ) "*"          " "   "*"   " "   " "        "*"        " "        " "         
8  ( 1 ) "*"          "*"   " "   "*"   " "        " "        "*"        " "         
         year quarterQ2 quarterQ3 quarterQ4
1  ( 1 ) " "  " "       "*"       " "      
2  ( 1 ) " "  " "       "*"       "*"      
3  ( 1 ) "*"  " "       "*"       "*"      
4  ( 1 ) "*"  "*"       "*"       "*"      
5  ( 1 ) "*"  " "       "*"       "*"      
6  ( 1 ) "*"  "*"       "*"       "*"      
7  ( 1 ) "*"  "*"       "*"       "*"      
8  ( 1 ) "*"  "*"       "*"       "*"      

Top variables are Q3, Q4, Year, Q2

I tried to put region back in but it is still running errors - I may test it anyway

So - let’s compare quarter, year and region and see which works best

Model 2a - quarter

# model 2a - using quarter as the variable
# bringing back in the original dataset with regions
model2a <- lm(average_price ~ type + quarter, data = avocado_trim)
model2a

Call:
lm(formula = average_price ~ type + quarter, data = avocado_trim)

Coefficients:
(Intercept)  typeorganic    quarterQ2    quarterQ3    quarterQ4  
    1.05863      0.49596      0.06855      0.20631      0.15204  

Average Price is our predicted value

For Quarter 3 Average price = 1.05863 + (0.496 x Organic(type) + (0.20631 x Quarter3))

If an avocado is organic and picked in quarter 3 the price of it will increase by 0.496 + 0.20631 assuming all other variables remain constant.

summary(model2a)

Call:
lm(formula = average_price ~ type + quarter, data = avocado_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.11458 -0.20089 -0.02458  0.18542  1.54687 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.058626   0.004718  224.38   <2e-16 ***
typeorganic 0.495958   0.004543  109.16   <2e-16 ***
quarterQ2   0.068546   0.006282   10.91   <2e-16 ***
quarterQ3   0.206308   0.006281   32.84   <2e-16 ***
quarterQ4   0.152040   0.006237   24.38   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3069 on 18244 degrees of freedom
Multiple R-squared:  0.4193,    Adjusted R-squared:  0.4192 
F-statistic:  3294 on 4 and 18244 DF,  p-value: < 2.2e-16

The p-value is less than 0.05 so we know this is statistically significant. The R^2 value tells us that 41.93% of the variation in the average price can be accounted by the avocado being organic.

par(mfrow = c(2,2))
plot(model2a)

I am liking the Q-Q here

Model 2b - year

# model 2b - using year as the variable
# bringing back in the original dataset with regions
model2b <- lm(average_price ~ type + year, data = avocado_trim)
model2b

Call:
lm(formula = average_price ~ type + year, data = avocado_trim)

Coefficients:
(Intercept)  typeorganic         year  
  -79.35649      0.49596      0.03993  

Going to stop looking at year now - as it is treating it as a numeric

Model 2c - region

# model 2c - using quarter as the variable
# bringing back in the original dataset with regions
model2c <- lm(average_price ~ type + region, data = avocado_trim)
model2c

Call:
lm(formula = average_price ~ type + region, data = avocado_trim)

Coefficients:
              (Intercept)                typeorganic              regionAtlanta  
                 1.313079                   0.495912                  -0.223077  
regionBaltimoreWashington                regionBoise               regionBoston  
                -0.026805                  -0.212899                  -0.030148  
   regionBuffaloRochester           regionCalifornia            regionCharlotte  
                -0.044201                  -0.165710                   0.045000  
            regionChicago     regionCincinnatiDayton             regionColumbus  
                -0.004260                  -0.351834                  -0.308254  
      regionDallasFtWorth               regionDenver              regionDetroit  
                -0.475444                  -0.342456                  -0.284941  
        regionGrandRapids           regionGreatLakes   regionHarrisburgScranton  
                -0.056036                  -0.222485                  -0.047751  
regionHartfordSpringfield              regionHouston         regionIndianapolis  
                 0.257604                  -0.513107                  -0.247041  
       regionJacksonville             regionLasVegas           regionLosAngeles  
                -0.050089                  -0.180118                  -0.345030  
         regionLouisville    regionMiamiFtLauderdale             regionMidsouth  
                -0.274349                  -0.132544                  -0.156272  
          regionNashville     regionNewOrleansMobile              regionNewYork  
                -0.348935                  -0.256243                   0.166538  
          regionNortheast   regionNorthernNewEngland              regionOrlando  
                 0.040888                  -0.083639                  -0.054822  
       regionPhiladelphia        regionPhoenixTucson           regionPittsburgh  
                 0.071095                  -0.336598                  -0.196716  
             regionPlains             regionPortland    regionRaleighGreensboro  
                -0.124527                  -0.243314                  -0.005917  
    regionRichmondNorfolk              regionRoanoke           regionSacramento  
                -0.269704                  -0.313107                   0.060533  
           regionSanDiego         regionSanFrancisco              regionSeattle  
                -0.162870                   0.243166                  -0.118462  
      regionSouthCarolina         regionSouthCentral            regionSoutheast  
                -0.157751                  -0.459793                  -0.163018  
            regionSpokane              regionStLouis             regionSyracuse  
                -0.115444                  -0.130414                  -0.040710  
              regionTampa              regionTotalUS                 regionWest  
                -0.152189                  -0.242012                  -0.288817  
   regionWestTexNewMexico  
                -0.297114  

Oh wow!! This will take some analysis - so let’s look at the summary

summary(model2c)

Call:
lm(formula = average_price ~ type + region, data = avocado_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.09858 -0.16716 -0.01814  0.14692  1.51320 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                1.313079   0.014894  88.159  < 2e-16 ***
typeorganic                0.495912   0.004017 123.452  < 2e-16 ***
regionAtlanta             -0.223077   0.020871 -10.688  < 2e-16 ***
regionBaltimoreWashington -0.026805   0.020871  -1.284  0.19906    
regionBoise               -0.212899   0.020871 -10.201  < 2e-16 ***
regionBoston              -0.030148   0.020871  -1.444  0.14863    
regionBuffaloRochester    -0.044201   0.020871  -2.118  0.03421 *  
regionCalifornia          -0.165710   0.020871  -7.940 2.15e-15 ***
regionCharlotte            0.045000   0.020871   2.156  0.03109 *  
regionChicago             -0.004260   0.020871  -0.204  0.83826    
regionCincinnatiDayton    -0.351834   0.020871 -16.857  < 2e-16 ***
regionColumbus            -0.308254   0.020871 -14.769  < 2e-16 ***
regionDallasFtWorth       -0.475444   0.020871 -22.780  < 2e-16 ***
regionDenver              -0.342456   0.020871 -16.408  < 2e-16 ***
regionDetroit             -0.284941   0.020871 -13.652  < 2e-16 ***
regionGrandRapids         -0.056036   0.020871  -2.685  0.00726 ** 
regionGreatLakes          -0.222485   0.020871 -10.660  < 2e-16 ***
regionHarrisburgScranton  -0.047751   0.020871  -2.288  0.02216 *  
regionHartfordSpringfield  0.257604   0.020871  12.342  < 2e-16 ***
regionHouston             -0.513107   0.020871 -24.584  < 2e-16 ***
regionIndianapolis        -0.247041   0.020871 -11.836  < 2e-16 ***
regionJacksonville        -0.050089   0.020871  -2.400  0.01641 *  
regionLasVegas            -0.180118   0.020871  -8.630  < 2e-16 ***
regionLosAngeles          -0.345030   0.020871 -16.531  < 2e-16 ***
regionLouisville          -0.274349   0.020871 -13.145  < 2e-16 ***
regionMiamiFtLauderdale   -0.132544   0.020871  -6.351 2.20e-10 ***
regionMidsouth            -0.156272   0.020871  -7.487 7.35e-14 ***
regionNashville           -0.348935   0.020871 -16.718  < 2e-16 ***
regionNewOrleansMobile    -0.256243   0.020871 -12.277  < 2e-16 ***
regionNewYork              0.166538   0.020871   7.979 1.56e-15 ***
regionNortheast            0.040888   0.020871   1.959  0.05013 .  
regionNorthernNewEngland  -0.083639   0.020871  -4.007 6.16e-05 ***
regionOrlando             -0.054822   0.020871  -2.627  0.00863 ** 
regionPhiladelphia         0.071095   0.020871   3.406  0.00066 ***
regionPhoenixTucson       -0.336598   0.020871 -16.127  < 2e-16 ***
regionPittsburgh          -0.196716   0.020871  -9.425  < 2e-16 ***
regionPlains              -0.124527   0.020871  -5.966 2.47e-09 ***
regionPortland            -0.243314   0.020871 -11.658  < 2e-16 ***
regionRaleighGreensboro   -0.005917   0.020871  -0.284  0.77679    
regionRichmondNorfolk     -0.269704   0.020871 -12.922  < 2e-16 ***
regionRoanoke             -0.313107   0.020871 -15.002  < 2e-16 ***
regionSacramento           0.060533   0.020871   2.900  0.00373 ** 
regionSanDiego            -0.162870   0.020871  -7.803 6.35e-15 ***
regionSanFrancisco         0.243166   0.020871  11.651  < 2e-16 ***
regionSeattle             -0.118462   0.020871  -5.676 1.40e-08 ***
regionSouthCarolina       -0.157751   0.020871  -7.558 4.28e-14 ***
regionSouthCentral        -0.459793   0.020871 -22.030  < 2e-16 ***
regionSoutheast           -0.163018   0.020871  -7.811 6.00e-15 ***
regionSpokane             -0.115444   0.020871  -5.531 3.22e-08 ***
regionStLouis             -0.130414   0.020871  -6.248 4.24e-10 ***
regionSyracuse            -0.040710   0.020871  -1.951  0.05113 .  
regionTampa               -0.152189   0.020871  -7.292 3.18e-13 ***
regionTotalUS             -0.242012   0.020871 -11.595  < 2e-16 ***
regionWest                -0.288817   0.020871 -13.838  < 2e-16 ***
regionWestTexNewMexico    -0.297114   0.020918 -14.204  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2713 on 18194 degrees of freedom
Multiple R-squared:  0.5473,    Adjusted R-squared:  0.546 
F-statistic: 407.4 on 54 and 18194 DF,  p-value: < 2.2e-16

The p-value is mostly less than 0.05 but there are some regions where is is greater than 0.05 which could make the data misleading

The R^2 value tells us that 54.73% of the variation in the average price can be accounted by the avocado being organic and by the region it is in

par(mfrow = c(2,2))
plot(model2c)

Compare Model 1a, 2a and 2c

Time to use anova to compare the models:

anova(model1a, model2a)
Analysis of Variance Table

Model 1: average_price ~ type
Model 2: average_price ~ type + quarter
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1  18247 1836.7                                  
2  18244 1718.2  3    118.54 419.56 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The null hypothesis here is that the models explain the same amount of response variance. The alternative is that they don’t. In this case, we find a p-value less than 0.05, and so we reject the null hypothesis and say that the model including type is significantly better than the model excluding it!

However, the model including region is still better overall (with higher r2), and so we choose region over quarter in this case. But perhaps we can include it as a third variable?

anova(model1a, model2c)
Analysis of Variance Table

Model 1: average_price ~ type
Model 2: average_price ~ type + region
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1  18247 1836.7                                  
2  18194 1339.4 53    497.26 127.44 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Third Variable

avocado_rem_resid2 <- avocado_trim %>%
  add_residuals(model2c) %>%
  select(-c("average_price", "type", "region"))
ggpairs(avocado_rem_resid2)

 plot: [1,1] [----------------------------------------------------------]  1% est: 0s 
 plot: [1,2] [>---------------------------------------------------------]  2% est: 4s 
 plot: [1,3] [>---------------------------------------------------------]  2% est: 5s 
 plot: [1,4] [=>--------------------------------------------------------]  3% est: 5s 
 plot: [1,5] [=>--------------------------------------------------------]  4% est: 5s 
 plot: [1,6] [==>-------------------------------------------------------]  5% est: 5s 
 plot: [1,7] [==>-------------------------------------------------------]  6% est: 5s 
 plot: [1,8] [===>------------------------------------------------------]  7% est: 5s 
 plot: [1,9] [===>------------------------------------------------------]  7% est: 5s 
 plot: [1,10] [====>----------------------------------------------------]  8% est: 5s 
 plot: [1,11] [====>----------------------------------------------------]  9% est: 6s 
 plot: [2,1] [=====>----------------------------------------------------] 10% est: 6s 
 plot: [2,2] [=====>----------------------------------------------------] 11% est: 6s 
 plot: [2,3] [======>---------------------------------------------------] 12% est: 6s 
 plot: [2,4] [======>---------------------------------------------------] 12% est: 6s 
 plot: [2,5] [=======>--------------------------------------------------] 13% est: 6s 
 plot: [2,6] [=======>--------------------------------------------------] 14% est: 6s 
 plot: [2,7] [========>-------------------------------------------------] 15% est: 5s 
 plot: [2,8] [========>-------------------------------------------------] 16% est: 5s 
 plot: [2,9] [=========>------------------------------------------------] 17% est: 5s 
 plot: [2,10] [=========>-----------------------------------------------] 17% est: 5s 
 plot: [2,11] [=========>-----------------------------------------------] 18% est: 6s 
 plot: [3,1] [==========>-----------------------------------------------] 19% est: 5s 
 plot: [3,2] [===========>----------------------------------------------] 20% est: 5s 
 plot: [3,3] [===========>----------------------------------------------] 21% est: 6s 
 plot: [3,4] [===========>----------------------------------------------] 21% est: 5s 
 plot: [3,5] [============>---------------------------------------------] 22% est: 5s 
 plot: [3,6] [============>---------------------------------------------] 23% est: 5s 
 plot: [3,7] [=============>--------------------------------------------] 24% est: 5s 
 plot: [3,8] [=============>--------------------------------------------] 25% est: 5s 
 plot: [3,9] [==============>-------------------------------------------] 26% est: 5s 
 plot: [3,10] [==============>------------------------------------------] 26% est: 5s 
 plot: [3,11] [===============>-----------------------------------------] 27% est: 5s 
 plot: [4,1] [===============>------------------------------------------] 28% est: 5s 
 plot: [4,2] [================>-----------------------------------------] 29% est: 5s 
 plot: [4,3] [================>-----------------------------------------] 30% est: 5s 
 plot: [4,4] [=================>----------------------------------------] 31% est: 5s 
 plot: [4,5] [=================>----------------------------------------] 31% est: 5s 
 plot: [4,6] [==================>---------------------------------------] 32% est: 5s 
 plot: [4,7] [==================>---------------------------------------] 33% est: 5s 
 plot: [4,8] [===================>--------------------------------------] 34% est: 5s 
 plot: [4,9] [===================>--------------------------------------] 35% est: 5s 
 plot: [4,10] [===================>-------------------------------------] 36% est: 5s 
 plot: [4,11] [====================>------------------------------------] 36% est: 5s 
 plot: [5,1] [=====================>------------------------------------] 37% est: 5s 
 plot: [5,2] [=====================>------------------------------------] 38% est: 4s 
 plot: [5,3] [======================>-----------------------------------] 39% est: 4s 
 plot: [5,4] [======================>-----------------------------------] 40% est: 4s 
 plot: [5,5] [======================>-----------------------------------] 40% est: 4s 
 plot: [5,6] [=======================>----------------------------------] 41% est: 4s 
 plot: [5,7] [=======================>----------------------------------] 42% est: 4s 
 plot: [5,8] [========================>---------------------------------] 43% est: 4s 
 plot: [5,9] [========================>---------------------------------] 44% est: 4s 
 plot: [5,10] [========================>--------------------------------] 45% est: 4s 
 plot: [5,11] [=========================>-------------------------------] 45% est: 4s 
 plot: [6,1] [==========================>-------------------------------] 46% est: 4s 
 plot: [6,2] [==========================>-------------------------------] 47% est: 4s 
 plot: [6,3] [===========================>------------------------------] 48% est: 4s 
 plot: [6,4] [===========================>------------------------------] 49% est: 4s 
 plot: [6,5] [============================>-----------------------------] 50% est: 4s 
 plot: [6,6] [============================>-----------------------------] 50% est: 4s 
 plot: [6,7] [=============================>----------------------------] 51% est: 4s 
 plot: [6,8] [=============================>----------------------------] 52% est: 3s 
 plot: [6,9] [==============================>---------------------------] 53% est: 3s 
 plot: [6,10] [==============================>--------------------------] 54% est: 3s 
 plot: [6,11] [==============================>--------------------------] 55% est: 3s 
 plot: [7,1] [===============================>--------------------------] 55% est: 3s 
 plot: [7,2] [================================>-------------------------] 56% est: 3s 
 plot: [7,3] [================================>-------------------------] 57% est: 3s 
 plot: [7,4] [=================================>------------------------] 58% est: 3s 
 plot: [7,5] [=================================>------------------------] 59% est: 3s 
 plot: [7,6] [==================================>-----------------------] 60% est: 3s 
 plot: [7,7] [==================================>-----------------------] 60% est: 3s 
 plot: [7,8] [==================================>-----------------------] 61% est: 3s 
 plot: [7,9] [===================================>----------------------] 62% est: 3s 
 plot: [7,10] [===================================>---------------------] 63% est: 3s 
 plot: [7,11] [===================================>---------------------] 64% est: 3s 
 plot: [8,1] [====================================>---------------------] 64% est: 3s 
 plot: [8,2] [=====================================>--------------------] 65% est: 3s 
 plot: [8,3] [=====================================>--------------------] 66% est: 2s 
 plot: [8,4] [======================================>-------------------] 67% est: 2s 
 plot: [8,5] [======================================>-------------------] 68% est: 2s 
 plot: [8,6] [=======================================>------------------] 69% est: 2s 
 plot: [8,7] [=======================================>------------------] 69% est: 2s 
 plot: [8,8] [========================================>-----------------] 70% est: 2s 
 plot: [8,9] [========================================>-----------------] 71% est: 2s 
 plot: [8,10] [========================================>----------------] 72% est: 2s 
 plot: [8,11] [========================================>----------------] 73% est: 2s 
 plot: [9,1] [==========================================>---------------] 74% est: 2s 
 plot: [9,2] [==========================================>---------------] 74% est: 2s 
 plot: [9,3] [===========================================>--------------] 75% est: 2s 
 plot: [9,4] [===========================================>--------------] 76% est: 2s 
 plot: [9,5] [============================================>-------------] 77% est: 2s 
 plot: [9,6] [============================================>-------------] 78% est: 2s 
 plot: [9,7] [=============================================>------------] 79% est: 2s 
 plot: [9,8] [=============================================>------------] 79% est: 2s 
 plot: [9,9] [=============================================>------------] 80% est: 1s 
 plot: [9,10] [=============================================>-----------] 81% est: 1s 
 plot: [9,11] [==============================================>----------] 82% est: 1s 
 plot: [10,1] [==============================================>----------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,2] [===============================================>---------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,3] [===============================================>---------] 84% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,4] [================================================>--------] 85% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,5] [================================================>--------] 86% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,6] [================================================>--------] 87% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,7] [=================================================>-------] 88% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,8] [=================================================>-------] 88% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,9] [==================================================>------] 89% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [10,10] [=================================================>------] 90% est: 1s 
 plot: [10,11] [==================================================>-----] 91% est: 1s 
 plot: [11,1] [===================================================>-----] 92% est: 1s 
 plot: [11,2] [====================================================>----] 93% est: 1s 
 plot: [11,3] [====================================================>----] 93% est: 1s 
 plot: [11,4] [=====================================================>---] 94% est: 1s 
 plot: [11,5] [=====================================================>---] 95% est: 0s 
 plot: [11,6] [======================================================>--] 96% est: 0s 
 plot: [11,7] [======================================================>--] 97% est: 0s 
 plot: [11,8] [=======================================================>-] 98% est: 0s 
 plot: [11,9] [=======================================================>-] 98% est: 0s 
 plot: [11,10] [=======================================================>] 99% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [11,11] [========================================================]100% est: 0s 
                                                                                      

Coefficients (of the Resid) total volume -0.017 x4046 -0.018 x4225 -0.023 x4770 -0.024 total bags -0.005 small bags -0.005 large bags -0.008 xl bags -0.031 year 0.139 - disregard this quarter - some variation - Do I need to make this a dummy?

Automated Approach (from homework answers)

regsubsets_forward <- regsubsets(average_price ~ ., 
                                 data = avocado_trim, 
                                 nvmax = 12,
                                 method = "forward")

plot(regsubsets_forward)

# See what's in model
plot(summary(regsubsets_forward)$bic, type = "b")

summary(regsubsets_forward)$which[8, ]
              (Intercept)              total_volume                     x4046                     x4225                     x4770 
                     TRUE                     FALSE                     FALSE                     FALSE                     FALSE 
               total_bags                small_bags                large_bags              x_large_bags               typeorganic 
                    FALSE                     FALSE                     FALSE                     FALSE                      TRUE 
                     year             regionAtlanta regionBaltimoreWashington               regionBoise              regionBoston 
                     TRUE                     FALSE                     FALSE                     FALSE                     FALSE 
   regionBuffaloRochester          regionCalifornia           regionCharlotte             regionChicago    regionCincinnatiDayton 
                    FALSE                     FALSE                     FALSE                     FALSE                     FALSE 
           regionColumbus       regionDallasFtWorth              regionDenver             regionDetroit         regionGrandRapids 
                    FALSE                      TRUE                     FALSE                     FALSE                     FALSE 
         regionGreatLakes  regionHarrisburgScranton regionHartfordSpringfield             regionHouston        regionIndianapolis 
                    FALSE                     FALSE                      TRUE                      TRUE                     FALSE 
       regionJacksonville            regionLasVegas          regionLosAngeles          regionLouisville   regionMiamiFtLauderdale 
                    FALSE                     FALSE                     FALSE                     FALSE                     FALSE 
           regionMidsouth           regionNashville    regionNewOrleansMobile             regionNewYork           regionNortheast 
                    FALSE                     FALSE                     FALSE                      TRUE                     FALSE 
 regionNorthernNewEngland             regionOrlando        regionPhiladelphia       regionPhoenixTucson          regionPittsburgh 
                    FALSE                     FALSE                     FALSE                     FALSE                     FALSE 
             regionPlains            regionPortland   regionRaleighGreensboro     regionRichmondNorfolk             regionRoanoke 
                    FALSE                     FALSE                     FALSE                     FALSE                     FALSE 
         regionSacramento            regionSanDiego        regionSanFrancisco             regionSeattle       regionSouthCarolina 
                    FALSE                     FALSE                      TRUE                     FALSE                     FALSE 
       regionSouthCentral           regionSoutheast             regionSpokane             regionStLouis            regionSyracuse 
                    FALSE                     FALSE                     FALSE                     FALSE                     FALSE 
              regionTampa             regionTotalUS                regionWest    regionWestTexNewMexico                   quarter 
                    FALSE                     FALSE                     FALSE                     FALSE                      TRUE 
# test if we should put regions in
mod_type_year <- lm(average_price ~ type + year, data = avocado_trim)
mod_type_region <- lm(average_price ~ type + year + region, data = avocado_trim)
anova(mod_type_year, mod_type_region)
Analysis of Variance Table

Model 1: average_price ~ type + year
Model 2: average_price ~ type + year + region
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1  18246 1811.0                                  
2  18193 1313.7 53    497.25 129.93 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# test if we should put year in
mod_type_year <- lm(average_price ~ type + year, data = avocado_trim)
mod_type_quarter <- lm(average_price ~ type + year + quarter, data = avocado_trim)
anova(mod_type_year, mod_type_quarter)
Analysis of Variance Table

Model 1: average_price ~ type + year
Model 2: average_price ~ type + year + quarter
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1  18246 1811.0                                  
2  18245 1702.4  1    108.58 1163.7 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# now let's test if the one with region and quarter is different than the one with just region

mod_type_region_quarter <- lm(average_price ~ type + year + region + quarter, data = avocado_trim)
anova(mod_type_region_quarter, mod_type_region)
Analysis of Variance Table

Model 1: average_price ~ type + year + region + quarter
Model 2: average_price ~ type + year + region
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1  18192 1205.2                                  
2  18193 1313.7 -1   -108.56 1638.8 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGx1YnJpZGF0ZSkKbGlicmFyeShqYW5pdG9yKQpsaWJyYXJ5KGJyb29tKQpsaWJyYXJ5KG1vZGVscikKbGlicmFyeShjYXJldCkKbGlicmFyeShsZWFwcykKbGlicmFyeShHR2FsbHkpCmxpYnJhcnkoZ2dmb3J0aWZ5KQpgYGAKCmBgYHtyfQpyYXdfYXZvY2FkbyA8LSByZWFkX2NzdigiZGF0YS9hdm9jYWRvLmNzdiIpCmBgYAoKIyBNVlAKCldlJ3ZlIGxvb2tlZCBhdCBhIGZldyBkaWZmZXJlbnQgd2F5cyBpbiB3aGljaCB3ZSBjYW4gYnVpbGQgbW9kZWxzIHRoaXMgd2VlaywgaW5jbHVkaW5nIGhvdyB0byBwcmVwYXJlIHRoZW0gcHJvcGVybHkuIFRoaXMgd2Vla2VuZCB3ZSdsbCBidWlsZCBhIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIG9uIGEgZGF0YXNldCB3aGljaCB3aWxsIG5lZWQgc29tZSBwcmVwYXJhdGlvbi4gVGhlIGRhdGEgY2FuIGJlIGZvdW5kIGluIHRoZSBkYXRhIGZvbGRlciwgYWxvbmcgd2l0aCBhIGRhdGEgZGljdGlvbmFyeQoKV2Ugd2FudCB0byBpbnZlc3RpZ2F0ZSB0aGUgYXZvY2FkbyBkYXRhc2V0LCBhbmQsIGluIHBhcnRpY3VsYXIsIHRvIG1vZGVsIHRoZSBBdmVyYWdlUHJpY2Ugb2YgdGhlIGF2b2NhZG9zLiBVc2UgdGhlIHRvb2xzIHdlJ3ZlIHdvcmtlZCB3aXRoIHRoaXMgd2VlayBpbiBvcmRlciB0byBwcmVwYXJlIHlvdXIgZGF0YXNldCBhbmQgZmluZCBhcHByb3ByaWF0ZSBwcmVkaWN0b3JzLiBPbmNlIHlvdSd2ZSBidWlsdCB5b3VyIG1vZGVsIHVzZSB0aGUgdmFsaWRhdGlvbiB0ZWNobmlxdWVzIGRpc2N1c3NlZCBvbiBXZWRuZXNkYXkgdG8gZXZhbHVhdGUgaXQuIEZlZWwgZnJlZSB0byBmb2N1cyBlaXRoZXIgb24gYnVpbGRpbmcgYW4gZXhwbGFuYXRvcnkgb3IgYSBwcmVkaWN0aXZlIG1vZGVsLCBvciBib3RoIGlmIHlvdSBhcmUgZmVlbGluZyBlbmVyZ2V0aWMhCgpBcyBwYXJ0IG9mIHRoZSBNVlAgd2Ugd2FudCB5b3Ugbm90IHRvIGp1c3QgcnVuIHRoZSBjb2RlIGJ1dCBhbHNvIGhhdmUgYSBnbyBhdCBpbnRlcnByZXRpbmcgdGhlIHJlc3VsdHMgYW5kIHdyaXRlIHlvdXIgdGhpbmtpbmcgaW4gY29tbWVudHMgaW4geW91ciBzY3JpcHQuCgpIaW50cyBhbmQgdGlwcwoKcmVnaW9uIG1heSBsZWFkIHRvIG1hbnkgZHVtbXkgdmFyaWFibGVzLiBUaGluayBjYXJlZnVsbHkgYWJvdXQgd2hldGhlciB0byBpbmNsdWRlIHRoaXMgdmFyaWFibGUgb3Igbm90ICh0aGVyZSBpcyBubyBvbmUgJ3JpZ2h0JyBhbnN3ZXIgdG8gdGhpcyEpIFRoaW5rIGFib3V0IHdoZXRoZXIgZWFjaCB2YXJpYWJsZSBpcyBjYXRlZ29yaWNhbCBvciBudW1lcmljYWwuIElmIGNhdGVnb3JpY2FsLCBtYWtlIHN1cmUgdGhhdCB0aGUgdmFyaWFibGUgaXMgcmVwcmVzZW50ZWQgYXMgYSBmYWN0b3IuIFdlIHdpbGwgbm90IHRyZWF0IHRoaXMgZGF0YSBhcyBhIHRpbWUgc2VyaWVzLCBzbyBEYXRlIHdpbGwgbm90IGJlIG5lZWRlZCBpbiB5b3VyIG1vZGVscywgYnV0IGNhbiB5b3UgZXh0cmFjdCBhbnkgdXNlZnVsIGZlYXR1cmVzIG91dCBvZiBEYXRlIGJlZm9yZSB5b3UgZGlzY2FyZCBpdD8gSWYgeW91IHdhbnQgdG8gYnVpbGQgYSBwcmVkaWN0aXZlIG1vZGVsLCBjb25zaWRlciB1c2luZyBlaXRoZXIgbGVhcHMgb3IgZ2xtdWx0aSB0byBoZWxwIHdpdGggdGhpcy4KCiMgRXhwbG9yYXRvcnkgRGF0YSBBbmFseXNpcwoKYGBge3J9CnN1bW1hcnkocmF3X2F2b2NhZG8pCmBgYAoKV2UgaGF2ZSAxODI0OCByb3dzIGFuZCAxNCB2YXJpYWJsZXMKCjEuICB4MSAtIFJvdyBjb3VudCAtIHRoaXMgY2FuIGJlIHJlbW92ZWQKMi4gIERhdGUgLSBXZSB3aWxsIG5vdCB0cmVhdCB0aGlzIGRhdGEgYXMgYSB0aW1lIHNlcmllcywgc28gRGF0ZSB3aWxsIG5vdCBiZSBuZWVkZWQgaW4geW91ciBtb2RlbHMsIGJ1dCBjYW4geW91IGV4dHJhY3QgYW55IHVzZWZ1bCBmZWF0dXJlcyBvdXQgb2YgRGF0ZSBiZWZvcmUgeW91IGRpc2NhcmQgaXQ/IChIb3cgYWJvdXQgbW9udGg/KSBydW5zIGZyb20gMjAxNS0yMDE4CjMuICBBdmUgUHJpY2UgLSB0aGlzIGlzIHRoZSB2YWx1ZSB3ZSB3aWxsIGJlIG1vZGVsbGluZy9wcmVkaWN0aW5nIC0gYXZlIHByaWNlIG9mIGEgc2luZ2xlIGF2b2NhZG8KNC4gIFRvdGFsIFZvbHVtZSAtIHRvdGFsIG51bWJlciBvZiBhdm9jYWRvZXMKNS4gIDQwNDY6IFNtYWxsL01lZGl1bSBIYXNzIEF2b2NhZG8KNi4gIDQyMjU6IExhcmdlIEhhc3MgQXZvY2Fkbwo3LiAgNDc3MDogRXh0cmEgTGFyZ2UgSGFzcyBBdm9jYWRvCjguICBUb3RhbCBCYWdzCjkuICBTbWFsbCBCYWdzCjEwLiBMYXJnZSBCYWdzCjExLiBYTGFyZ2UgQmFncwoxMi4gdHlwZTogY29udmVudGlvbmFsIG9yIG9yZ2FuaWMKMTMuIHllYXI6IHRoZSB5ZWFyCjE0LiByZWdpb246IHRoZSBjaXR5IG9yIHJlZ2lvbiBvZiB0aGUgb2JzZXJ2YXRpb24KCmBgYHtyfQojIENsZWFuIE5hbWVzCnJhd19hdm9jYWRvIDwtIHJhd19hdm9jYWRvICU+JSAKY2xlYW5fbmFtZXMoKQpgYGAKCmBgYHtyfQojIEZpeCB0aGUgZGF0ZSBmaWVsZCBhcyBpdCBpcyBub3QgY3VycmVudGx5IGEgZGF0ZSBmaWVsZApyYXdfYXZvY2FkbzwtIHJhd19hdm9jYWRvICU+JQogIG11dGF0ZShkYXRlPSB5bWQoZGF0ZSkpCmBgYAoKYGBge3J9CiMgQWRkIGluIGEgbW9udGggY29sdW1uCnJhd19hdm9jYWRvPC0gcmF3X2F2b2NhZG8gJT4lCiAgbXV0YXRlKG1vbnRoID0gbW9udGgoZGF0ZSwgbGFiZWwgPSBUUlVFLCBhYmJyID0gRkFMU0UpKQpgYGAKCmBgYHtyfQpyYXdfYXZvY2FkbyAlPiUgCiAgZ3JvdXBfYnkobW9udGgpICU+JSAKICBzdW1tYXJpc2UoY291bnQ9bigpKQpgYGAKClBlcmhhcHMgZ3JvdXAgdGhlIE1vbnRocyBpbnRvIHF1YXJ0ZXJzCgpgYGB7cn0KIyBBZGQgaW4gYSBxdWFydGVyIGNvbHVtbgpyYXdfYXZvY2FkbzwtIHJhd19hdm9jYWRvICU+JQogIG11dGF0ZShxdWFydGVyID0gcXVhcnRlcihkYXRlKSkKYGBgCgpgYGB7cn0KIyBCb3ggcGxvdCBjb21wYXJpbmcgdHlwZSAoY29udmVudGlvbmFsIHZzIG9yZ2FuaWMpCmdncGxvdChyYXdfYXZvY2FkbywgYWVzKHg9YXMuZmFjdG9yKHR5cGUpLCB5PWF2ZXJhZ2VfcHJpY2UpKSArIAogICAgZ2VvbV9ib3hwbG90KGZpbGw9InNsYXRlYmx1ZSIsIGFscGhhPTAuMikgKyAKICAgIHhsYWIoImN5bCIpCmBgYAoKU28gdGhlIG9yZ2FuaWMgYXZvY2Fkb2VzIGRyaXZlIHRoZSBwcmljZSB1cAoKYGBge3J9CiMgU2ltcGxlIGxpbmUgZ3JhcGhzIGxvb2tpbmcgYXQgc29tZSBvZiB0aGUgdmFyaWFibGVzCmdncGxvdChyYXdfYXZvY2FkbywgYWVzKHg9YXZlcmFnZV9wcmljZSkpICsgCiAgZ2VvbV9saW5lKGFlcyh5ID0geDQyMjUpLCBjb2xvciA9ICJvcmFuZ2UiLCBhbHBoYSA9IDAuNCkgKwogIGdlb21fbGluZShhZXMoeSA9IHg0MDQ2KSwgY29sb3IgPSAiZGFya3JlZCIsIGFscGhhID0gMC40KSArCiAgZ2VvbV9saW5lKGFlcyh5ID0geDQ3NzApLCBjb2xvcj0ic3RlZWxibHVlIiwgYWxwaGEgPSAwLjQpIAoKYGBgCgpEb2Vzbid0IHJlYWxseSB0ZWxsIHVzIG11Y2ggLSBidXQgd2UgZ2V0IGFuIGlkZWEgb2YgdGhlIHNoYXBlIG9mIHRoZSBkYXRhLgoKYGBge3J9CnJlZ2lvbnMgPC0gcmF3X2F2b2NhZG8gJT4lIAogIGdyb3VwX2J5KHJlZ2lvbikgJT4lIAogIHN1bW1hcmlzZShjb3VudCA9IG4oKSkKYGBgCgpUaGVyZSBhcmUgNTQgcmVnaW9ucywgd2l0aCB0aGUgc2FtZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGZyb20gZWFjaC4gRm9yIG1vZGVsbGluZyB0aGlzIGNvdWxkIGJlIGEgcHJvYmxlbSAtIGJ1dCBwZXJoYXBzIHdlIGNhbiBmaW5kIG9uZSBvciB0d28gcmVnaW9ucyB0aGF0IGFyZSBrZXkgZm9yIGRyaXZpbmcgdXAgcHJpY2VzLgoKUGVyaGFwcyB3ZSBzaG91bGQgbG9vayBhdCBzb21lIHNpbXBsZSBzdGF0cyBwZXIgcmVnaW9uLgoKYGBge3J9CnJlZ2lvbnMgPC0gcmF3X2F2b2NhZG8gJT4lIAogIGdyb3VwX2J5KHJlZ2lvbikgJT4lIAogIHN1bW1hcmlzZShjb3VudCA9IG4oKSwgbWVhbihhdmVyYWdlX3ByaWNlKSwgbWVhbih4NDA0NiksIG1lYW4oeDQyMjUpLCAKICAgICAgICAgICAgbWVhbih4NDc3MCkpCnJlZ2lvbnMKYGBgCgpgYGB7cn0KcmF3X2F2b2NhZG8gJT4lCiAgZ2dwbG90KGFlcyh4ID0gYXZlcmFnZV9wcmljZSwgeSA9IHJlZ2lvbikpICsKICBnZW9tX2JveHBsb3QoKQpgYGAKUGhldyAtIHdoYXQgYSBtZXNzCgpMZXQncyByb3RhdGUgaXQKCmBgYHtyfQpyYXdfYXZvY2FkbyAlPiUKICBnZ3Bsb3QoYWVzKHggPSByZWdpb24sIHkgPSBhdmVyYWdlX3ByaWNlKSkgKwogIGdlb21fYm94cGxvdCgpICsKICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDQ1KSkKYGBgClVnbHkgZ3JhcGggLSBidXQgZ2l2ZXMgdXMgYSBnbGltcHNlIGF0IHRoZSB2YXJpYXRpb24gYmV0d2VlbiByZWdpb25zIC0gc28gcGVyaGFwcyB0aGlzIGlzIGltcG9ydGFudCBhZnRlciBhbGwuCgpgYGB7cn0KIyBUaWR5IHVwIHZhcmlhYmxlcwojIFJlbW92ZSByb3cgY291bnQsIGRhdGUgYW5kIG1vbnRoCmF2b2NhZG9fdHJpbSA8LSByYXdfYXZvY2FkbyAlPiUgCiAgc2VsZWN0KC1jKHgxLCBkYXRlLCBtb250aCkpCmBgYAoKIyBTdGFydCBNb2RlbGxpbmcKIyMgQ2hlY2sgZm9yIGFsaWFzZWQgdmFyaWFibGUKYGBge3J9CmFsaWFzKGxtKGF2ZXJhZ2VfcHJpY2UgfiAuLCBkYXRhID0gYXZvY2Fkb190cmltKSkKYGBgCgpMb29rcyBsaWtlIHdlIGhhdmUgbm8gYWxpYXNlZCB2YXJpYWJsZXMgLSB3ZSBhcmUgZ29vZCB0byBnbwoKIyMgUnVuIGdncGFpcnMKYGBge3J9CiMgVGhpcyBjYXVzZXMgZXJyb3JzIGJlY2F1c2Ugb2YgdGhlIHJlZ2lvbnMKYXZvY2Fkb190cmltICU+JSAKR0dhbGx5OjpnZ3BhaXJzKCkKYGBgCgoKYGBge3J9CiMgTGV0J3Mgc2VlIGlmIGl0IHdvcmtzIGlmIHdlIGNvbnZlcnQgdG8gbnVtZXJpYy9ub24tbnVtZXJpYwphdm9jYWRvX3RyaW1fbnVtZXJpYyA8LSBhdm9jYWRvX3RyaW0gJT4lCiAgc2VsZWN0X2lmKGlzLm51bWVyaWMpCgphdm9jYWRvX3RyaW1fbm9ubnVtZXJpYyA8LSBhdm9jYWRvX3RyaW0gJT4lCiAgc2VsZWN0X2lmKGZ1bmN0aW9uKHgpICFpcy5udW1lcmljKHgpKQoKYXZvY2Fkb190cmltX25vbm51bWVyaWMkcHJpY2UgPC0gYXZvY2Fkb190cmltJHByaWNlCgpnZ3BhaXJzKGF2b2NhZG9fdHJpbV9udW1lcmljKQpnZ3BhaXJzKGF2b2NhZG9fdHJpbV9ub25udW1lcmljKQpgYGAKU28gLSBzb21lIG9ic2VydmF0aW9uczogUmVnaW9ucyBjb250aW51ZSB0byBjYXVzZSBwcm9ibGVtcyAtIHNvIG5lZWQgdG8gcmV0aGluayBpdC4gVGhlIHF1YXJ0ZXJzIGFyZSBiZWluZyByZWNvZ25pc2VkIGFzIG51bWVyaWMsIG5vdCBjYXRlZ29yaWVzIC0gc28gbmVlZCB0byByZWNvZGUKCiMjIFJlY29kZSBwcm9ibGVtIGRhdGEKCmBgYHtyfQphdm9jYWRvX3RyaW0gPC0gYXZvY2Fkb190cmltICU+JSAKICBtdXRhdGUocXVhcnRlciA9IHN0cl9jKCJRIiwgcXVhcnRlcikpCmBgYAoKCmBgYHtyfQojIFJlbW92ZSByZWdpb25zCmF2b2NhZG9fdHJpbV9uciA8LSBhdm9jYWRvX3RyaW0gJT4lIAogIHNlbGVjdCgtYyhyZWdpb24pKQpgYGAKCmBgYHtyfQojIEF0dGVtcHQgdHdvCmF2b2NhZG9fdHJpbV9udW1lcmljIDwtIGF2b2NhZG9fdHJpbV9uciAlPiUKICBzZWxlY3RfaWYoaXMubnVtZXJpYykKCmF2b2NhZG9fdHJpbV9ub25udW1lcmljIDwtIGF2b2NhZG9fdHJpbV9uciAlPiUKICBzZWxlY3RfaWYoZnVuY3Rpb24oeCkgIWlzLm51bWVyaWMoeCkpCgphdm9jYWRvX3RyaW1fbm9ubnVtZXJpYyRhdmVyYWdlX3ByaWNlIDwtIGF2b2NhZG9fdHJpbV9uciRhdmVyYWdlX3ByaWNlCgpnZ3BhaXJzKGF2b2NhZG9fdHJpbV9udW1lcmljKQpnZ3BhaXJzKGF2b2NhZG9fdHJpbV9ub25udW1lcmljKQpgYGAKTm9uLW51bWVyaWMKICBUeXBlIGlzIGRlZmluaXRlbHkgYSBrZXkgdmFyaWFibGUKICBRdWFydGVyIGhhcyBzb21lIGluZmx1ZW5jZQogIApOdW1lcmljIGNvcnJlbGF0aW9ucyAob2YgYXZlcmFnZSBwcmljZSkKWWVhciAwLjA5Mwp4bCBiYWdzIC0wLjExOApsYXJnZSBiYWdzIC0wLjE3Mwp4NDIyNSAtMC4xNzMKc21hbGwgYmFncyAtMC4xNzUKdG90YWwgYmFncyAtMC4xNzcKeDQ3NzAgLTAuMTc5CnRvdGFsIHZvbHVtZSAtMC4xOTMKeDQwNDYgLTAuMjA4CgpUaGUgaGlnaGVzdCBjb3JyZWxhdGlvbiBzY29yZXMgKHRvcCB0aHJlZSkKeDQwNDYgLTAuMjA4CnRvdGFsIHZvbHVtZSAtMC4xOTMKeDQ3NzAgLTAuMTc5CgojIyBUcnkgZXhoYXVzdGl2ZSBtb2RlbGxpbmcKdG8gaWRlbnRpZnkga2V5IHZhcmlhYmxlcwoKYGBge3J9CiMgZXhoYXVzdGl2ZSBzZWxlY3Rpb24KcmVnc3Vic2V0c19leGhhdXN0aXZlIDwtIHJlZ3N1YnNldHMoYXZlcmFnZV9wcmljZSB+IC4sIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBkYXRhID0gYXZvY2Fkb190cmltX25yLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbnZtYXggPTgsICMgbWF4bSBzaXplIG9mIHN1YnNldHMKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbWV0aG9kID0gImV4aGF1c3RpdmUiKQpgYGAKCmBgYHtyfQpzdW1fcmVnc3Vic2V0c19leGhhdXN0aXZlIDwtIHN1bW1hcnkocmVnc3Vic2V0c19leGhhdXN0aXZlKQpzdW1fcmVnc3Vic2V0c19leGhhdXN0aXZlCmBgYAoKYGBge3J9CnN1bV9yZWdzdWJzZXRzX2V4aGF1c3RpdmUkd2hpY2gKYGBgCgpgYGB7cn0KcGxvdChyZWdzdWJzZXRzX2V4aGF1c3RpdmUsIHNjYWxlID0gImFkanIyIikKYGBgCmBgYHtyfQpwbG90KHJlZ3N1YnNldHNfZXhoYXVzdGl2ZSwgc2NhbGUgPSAiYmljIikKYGBgCmBgYHtyfQpwbG90KHN1bV9yZWdzdWJzZXRzX2V4aGF1c3RpdmUkcnNxLCB0eXBlID0gImIiKQpgYGAKSW50ZXJlc3RpbmdseSB0aGVyZSBpcyBubyBlbGJvdyBpbiB0aGUgcGxvdCBzbyB0aGVyZSBpcyBubyBjbGVhciBwb2ludCBhdCB3aGljaCB0byBzdG9wIG1vZGVsbGluZy4KCmBgYHtyfQpwbG90KHN1bV9yZWdzdWJzZXRzX2V4aGF1c3RpdmUkYmljLCB0eXBlID0gImIiKQpgYGAKYGBge3J9CnN1bW1hcnkocmVnc3Vic2V0c19leGhhdXN0aXZlKSR3aGljaFs2LF0KYGBgCgpFeGhhdXN0aW5nIG1vZGVsbGluZyBzdWdnZXN0cyB0byB1cyB0aGF0IHRoZSBrZXkgdmFyaWFibGVzIChpbiBvcmRlcikgYXJlOgp0eXBlIChvcmdhbmljKQpxdWFydGVyICgwMykKcXVhcnRlcigwNCkKeWVhcgpxdWFydGVyKDAyKQoKIyMgRmlyc3QgVmFyaWFibGUgc2VsZWN0aW9uCiMjIyBNb2RlbCAxYSAtIHR5cGUKCmBgYHtyfQojIG1vZGVsIDFhIC0gdXNpbmcgdHlwZSBhcyB0aGUgdmFyaWFibGUKbW9kZWwxYSA8LSBsbShhdmVyYWdlX3ByaWNlIH4gdHlwZSwgZGF0YSA9IGF2b2NhZG9fdHJpbV9ucikKbW9kZWwxYQpgYGAKCmBgYHtyfQojIG1vZGVsIDFhIC0gdXNpbmcgdHlwZSBhcyB0aGUgdmFyaWFibGUKbW9kZWwxYSA8LSBsbShhdmVyYWdlX3ByaWNlIH4gdHlwZSwgZGF0YSA9IGF2b2NhZG9fdHJpbSkKbW9kZWwxYQpgYGAKCkF2ZXJhZ2UgUHJpY2UgaXMgb3VyIHByZWRpY3RlZCB2YWx1ZQoKQXZlcmFnZSBwcmljZSA9IDEuMTU4ICsgKDAuNDk2IHggT3JnYW5pYyh0eXBlKSkKCklmIGFuIGF2b2NhZG8gaXMgb3JnYW5pYyB0aGUgcHJpY2Ugb2YgaXQgd2lsbCBpbmNyZWFzZSBieSAwLjQ5NiBhc3N1bWluZyBhbGwgb3RoZXIgdmFyaWFibGVzIHJlbWFpbiBjb25zdGFudC4KCmBgYHtyfQpzdW1tYXJ5KG1vZGVsMWEpCmBgYAoKVGhlIHAtdmFsdWUgaXMgbGVzcyB0aGFuIDAuMDUgc28gd2Uga25vdyB0aGlzIGlzIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQuIFRoZSBSXjIgdmFsdWUgdGVsbHMgdXMgdGhhdCAzNy45JSBvZiB0aGUgdmFyaWF0aW9uIGluIHRoZSBhdmVyYWdlIHByaWNlIGNhbiBiZSBhY2NvdW50ZWQgYnkgdGhlIGF2b2NhZG8gYmVpbmcgb3JnYW5pYy4KCmBgYHtyfQpwYXIobWZyb3cgPSBjKDIsMikpCnBsb3QobW9kZWwxYSkKYGBgCgpCZWZvcmUgd2UgYWNjZXB0IHRoaXMgYXMgb3VyIGZpcnN0IHZhcmlhYmxlIGxldCdzIGNoZWNrIHdpdGggb3VyIHNlY29uZCBwcmVkaWN0b3IgLSBxdWFydGVyIDMKCiMjIyBNb2RlbCAxYiAtIHF1YXJ0ZXIKCmBgYHtyfQojIG1vZGVsIDFiIC0gdXNpbmcgcXVhcnRlciBhcyB0aGUgdmFyaWFibGUKbW9kZWwxYiA8LSBsbShhdmVyYWdlX3ByaWNlIH4gcXVhcnRlciwgZGF0YSA9IGF2b2NhZG9fdHJpbV9ucikKbW9kZWwxYgpgYGAKCkF2ZXJhZ2UgUHJpY2UgaXMgb3VyIHByZWRpY3RlZCB2YWx1ZQoKQXZlcmFnZSBwcmljZSA9IDEuMzA2NjAgKyAoMC4yMDYzMSB4IE9yZ2FuaWModHlwZSkpCgpJZiBhbiBhdm9jYWRvIGlzIG9yZ2FuaWMgdGhlIHByaWNlIG9mIGl0IHdpbGwgaW5jcmVhc2UgYnkgMC40OTYgYXNzdW1pbmcgYWxsIG90aGVyIHZhcmlhYmxlcyByZW1haW4gY29uc3RhbnQuCgpgYGB7cn0Kc3VtbWFyeShtb2RlbDFiKQpgYGAKClRoZSBwLXZhbHVlIGlzIGxlc3MgdGhhbiAwLjA1IHNvIHdlIGtub3cgdGhpcyBpcyBzdGF0aXN0aWNhbGx5IHNpZ25pZmljYW50LiBUaGUgUl4yIHZhbHVlIHRlbGxzIHVzIHRoYXQgNCUgb2YgdGhlIHZhcmlhdGlvbiBpbiB0aGUgYXZlcmFnZSBwcmljZSBjYW4gYmUgYWNjb3VudGVkIGJ5IHRoZSBhdm9jYWRvIGJlaW5nIG9yZ2FuaWMuCgpgYGB7cn0KcGFyKG1mcm93ID0gYygyLDIpKQpwbG90KG1vZGVsMWIpCmBgYAoKTW9kZWwxYSBpcyBkZWZpbml0ZWx5IGEgYmV0dGVyIG1vZGVsIHRoYW4gTW9kZWwxYiAtIHNvIGxldCdzIGNob29zZSB0eXBlIGZvciB0aGUgZmlyc3QgdmFyaWFibGUuCgojIyBTZWNvbmQgVmFyaWFibGUgc2VsZWN0aW9uCgpOb3cgd2UgbmVlZCB0byByZXJ1biB0aGUgYW5hbHlzaXMgdG8gZGV0ZXJtaW5lIHRoZSBuZXh0IHZhcmlhYmxlCgpgYGB7cn0KYXZvY2Fkb19yZW1fcmVzaWQgPC0gYXZvY2Fkb190cmltX25yICU+JQogIGFkZF9yZXNpZHVhbHMobW9kZWwxYSkgJT4lCiAgc2VsZWN0KC1jKCJhdmVyYWdlX3ByaWNlIiwgInR5cGUiKSkKZ2dwYWlycyhhdm9jYWRvX3JlbV9yZXNpZCkKYGBgCgpDb2VmZmljaWVudHMgKG9mIHRoZSBSZXNpZCkKdG90YWwgdm9sdW1lIC0wLjA2Mwp4NDA0NiAtMC4wODgKeDQyMjUgLTAuMDM4Cng0NzcwIC0wLjA2NAp0b3RhbCBiYWdzIC0wLjA1NQpzbWFsbCBiYWdzIC0wLjA0OQpsYXJnZSBiYWdzIC0wLjA2OQp4bCBiYWdzIC0wLjAxMgp5ZWFyIDAuMTE4CnF1YXJ0ZXIgLSBzb21lIHZhcmlhdGlvbiAtIERvIEkgbmVlZCB0byBtYWtlIHRoaXMgYSBkdW1teT8KCmBgYHtyfQojIGV4aGF1c3RpdmUgc2VsZWN0aW9uCnJlZ3N1YnNldHNfZXhoYXVzdGl2ZTIgPC0gcmVnc3Vic2V0cyhyZXNpZCB+IC4sIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBkYXRhID0gYXZvY2Fkb19yZW1fcmVzaWQsIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBudm1heCA9OCwgIyBtYXhtIHNpemUgb2Ygc3Vic2V0cwogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBtZXRob2QgPSAiZXhoYXVzdGl2ZSIpCmBgYAoKYGBge3J9CnN1bV9yZWdzdWJzZXRzX2V4aGF1c3RpdmUyIDwtIHN1bW1hcnkocmVnc3Vic2V0c19leGhhdXN0aXZlMikKc3VtX3JlZ3N1YnNldHNfZXhoYXVzdGl2ZTIKYGBgCgpUb3AgdmFyaWFibGVzIGFyZSAgUTMsIFE0LCBZZWFyLCBRMgoKSSB0cmllZCB0byBwdXQgcmVnaW9uIGJhY2sgaW4gYnV0IGl0IGlzIHN0aWxsIHJ1bm5pbmcgZXJyb3JzIC0gSSBtYXkgdGVzdCBpdCBhbnl3YXkKClNvIC0gbGV0J3MgY29tcGFyZSBxdWFydGVyLCB5ZWFyIGFuZCByZWdpb24gYW5kIHNlZSB3aGljaCB3b3JrcyBiZXN0CgojIyMgTW9kZWwgMmEgLSBxdWFydGVyCgpgYGB7cn0KIyBtb2RlbCAyYSAtIHVzaW5nIHF1YXJ0ZXIgYXMgdGhlIHZhcmlhYmxlCiMgYnJpbmdpbmcgYmFjayBpbiB0aGUgb3JpZ2luYWwgZGF0YXNldCB3aXRoIHJlZ2lvbnMKbW9kZWwyYSA8LSBsbShhdmVyYWdlX3ByaWNlIH4gdHlwZSArIHF1YXJ0ZXIsIGRhdGEgPSBhdm9jYWRvX3RyaW0pCm1vZGVsMmEKYGBgCgpBdmVyYWdlIFByaWNlIGlzIG91ciBwcmVkaWN0ZWQgdmFsdWUKCkZvciBRdWFydGVyIDMgCkF2ZXJhZ2UgcHJpY2UgPSAxLjA1ODYzICsgKDAuNDk2IHggT3JnYW5pYyh0eXBlKSArICgwLjIwNjMxIHggUXVhcnRlcjMpKQoKSWYgYW4gYXZvY2FkbyBpcyBvcmdhbmljIGFuZCBwaWNrZWQgaW4gcXVhcnRlciAzIHRoZSBwcmljZSBvZiBpdCB3aWxsIGluY3JlYXNlIGJ5IDAuNDk2ICsgMC4yMDYzMSBhc3N1bWluZyBhbGwgb3RoZXIgdmFyaWFibGVzIHJlbWFpbiBjb25zdGFudC4KCmBgYHtyfQpzdW1tYXJ5KG1vZGVsMmEpCmBgYAoKVGhlIHAtdmFsdWUgaXMgbGVzcyB0aGFuIDAuMDUgc28gd2Uga25vdyB0aGlzIGlzIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQuIFRoZSBSXjIgdmFsdWUgdGVsbHMgdXMgdGhhdCA0MS45MyUgb2YgdGhlIHZhcmlhdGlvbiBpbiB0aGUgYXZlcmFnZSBwcmljZSBjYW4gYmUgYWNjb3VudGVkIGJ5IHRoZSBhdm9jYWRvIGJlaW5nIG9yZ2FuaWMuCgpgYGB7cn0KcGFyKG1mcm93ID0gYygyLDIpKQpwbG90KG1vZGVsMmEpCmBgYApJIGFtIGxpa2luZyB0aGUgUS1RIGhlcmUgCgoKIyMjIE1vZGVsIDJiIC0geWVhcgoKYGBge3J9CiMgbW9kZWwgMmIgLSB1c2luZyB5ZWFyIGFzIHRoZSB2YXJpYWJsZQojIGJyaW5naW5nIGJhY2sgaW4gdGhlIG9yaWdpbmFsIGRhdGFzZXQgd2l0aCByZWdpb25zCm1vZGVsMmIgPC0gbG0oYXZlcmFnZV9wcmljZSB+IHR5cGUgKyB5ZWFyLCBkYXRhID0gYXZvY2Fkb190cmltKQptb2RlbDJiCmBgYAoKR29pbmcgdG8gc3RvcCBsb29raW5nIGF0IHllYXIgbm93IC0gYXMgaXQgaXMgdHJlYXRpbmcgaXQgYXMgYSBudW1lcmljIAoKIyMjIE1vZGVsIDJjIC0gcmVnaW9uCgpgYGB7cn0KIyBtb2RlbCAyYyAtIHVzaW5nIHF1YXJ0ZXIgYXMgdGhlIHZhcmlhYmxlCiMgYnJpbmdpbmcgYmFjayBpbiB0aGUgb3JpZ2luYWwgZGF0YXNldCB3aXRoIHJlZ2lvbnMKbW9kZWwyYyA8LSBsbShhdmVyYWdlX3ByaWNlIH4gdHlwZSArIHJlZ2lvbiwgZGF0YSA9IGF2b2NhZG9fdHJpbSkKbW9kZWwyYwpgYGAKT2ggd293ISEgVGhpcyB3aWxsIHRha2Ugc29tZSBhbmFseXNpcyAtIHNvIGxldCdzIGxvb2sgYXQgdGhlIHN1bW1hcnkKCgpgYGB7cn0Kc3VtbWFyeShtb2RlbDJjKQpgYGAKClRoZSBwLXZhbHVlIGlzIG1vc3RseSBsZXNzIHRoYW4gMC4wNSBidXQgdGhlcmUgYXJlIHNvbWUgcmVnaW9ucyB3aGVyZSBpcyBpcyBncmVhdGVyIHRoYW4gMC4wNSB3aGljaCBjb3VsZCBtYWtlIHRoZSBkYXRhIG1pc2xlYWRpbmcKClRoZSBSXjIgdmFsdWUgdGVsbHMgdXMgdGhhdCA1NC43MyUgb2YgdGhlIHZhcmlhdGlvbiBpbiB0aGUgYXZlcmFnZSBwcmljZSBjYW4gYmUgYWNjb3VudGVkIGJ5IHRoZSBhdm9jYWRvIGJlaW5nIG9yZ2FuaWMgYW5kIGJ5IHRoZSByZWdpb24gaXQgaXMgaW4KCmBgYHtyfQpwYXIobWZyb3cgPSBjKDIsMikpCnBsb3QobW9kZWwyYykKYGBgCiMjIyBDb21wYXJlIE1vZGVsIDFhLCAyYSBhbmQgMmMKVGltZSB0byB1c2UgYW5vdmEgdG8gY29tcGFyZSB0aGUgbW9kZWxzOgoKYGBge3J9CmFub3ZhKG1vZGVsMWEsIG1vZGVsMmEpCmBgYAoKVGhlIG51bGwgaHlwb3RoZXNpcyBoZXJlIGlzIHRoYXQgdGhlIG1vZGVscyBleHBsYWluIHRoZSBzYW1lIGFtb3VudCBvZiByZXNwb25zZSB2YXJpYW5jZS4gVGhlIGFsdGVybmF0aXZlIGlzIHRoYXQgdGhleSBkb24ndC4gSW4gdGhpcyBjYXNlLCB3ZSBmaW5kIGEgcC12YWx1ZSBsZXNzIHRoYW4gMC4wNSwgYW5kIHNvIHdlIHJlamVjdCB0aGUgbnVsbCBoeXBvdGhlc2lzIGFuZCBzYXkgdGhhdCB0aGUgbW9kZWwgaW5jbHVkaW5nIHR5cGUgaXMgc2lnbmlmaWNhbnRseSBiZXR0ZXIgdGhhbiB0aGUgbW9kZWwgZXhjbHVkaW5nIGl0IQoKSG93ZXZlciwgdGhlIG1vZGVsIGluY2x1ZGluZyByZWdpb24gaXMgc3RpbGwgYmV0dGVyIG92ZXJhbGwgKHdpdGggaGlnaGVyIHIyKSwgYW5kIHNvIHdlIGNob29zZSByZWdpb24gb3ZlciBxdWFydGVyIGluIHRoaXMgY2FzZS4gQnV0IHBlcmhhcHMgd2UgY2FuIGluY2x1ZGUgaXQgYXMgYSB0aGlyZCB2YXJpYWJsZT8KCmBgYHtyfQphbm92YShtb2RlbDFhLCBtb2RlbDJjKQpgYGAKCiMjIFRoaXJkIFZhcmlhYmxlCgpgYGB7cn0KYXZvY2Fkb19yZW1fcmVzaWQyIDwtIGF2b2NhZG9fdHJpbSAlPiUKICBhZGRfcmVzaWR1YWxzKG1vZGVsMmMpICU+JQogIHNlbGVjdCgtYygiYXZlcmFnZV9wcmljZSIsICJ0eXBlIiwgInJlZ2lvbiIpKQpnZ3BhaXJzKGF2b2NhZG9fcmVtX3Jlc2lkMikKYGBgCgpDb2VmZmljaWVudHMgKG9mIHRoZSBSZXNpZCkKdG90YWwgdm9sdW1lIC0wLjAxNwp4NDA0NiAtMC4wMTgKeDQyMjUgLTAuMDIzCng0NzcwIC0wLjAyNAp0b3RhbCBiYWdzIC0wLjAwNQpzbWFsbCBiYWdzIC0wLjAwNQpsYXJnZSBiYWdzIC0wLjAwOAp4bCBiYWdzIC0wLjAzMQp5ZWFyIDAuMTM5IC0gZGlzcmVnYXJkIHRoaXMKcXVhcnRlciAtIHNvbWUgdmFyaWF0aW9uIC0gRG8gSSBuZWVkIHRvIG1ha2UgdGhpcyBhIGR1bW15PwoKCgoKCiMgQXV0b21hdGVkIEFwcHJvYWNoIChmcm9tIGhvbWV3b3JrIGFuc3dlcnMpCmBgYHtyfQpyZWdzdWJzZXRzX2ZvcndhcmQgPC0gcmVnc3Vic2V0cyhhdmVyYWdlX3ByaWNlIH4gLiwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGRhdGEgPSBhdm9jYWRvX3RyaW0sIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBudm1heCA9IDEyLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBtZXRob2QgPSAiZm9yd2FyZCIpCgpwbG90KHJlZ3N1YnNldHNfZm9yd2FyZCkKYGBgCgpgYGB7cn0KIyBTZWUgd2hhdCdzIGluIG1vZGVsCnBsb3Qoc3VtbWFyeShyZWdzdWJzZXRzX2ZvcndhcmQpJGJpYywgdHlwZSA9ICJiIikKYGBgCgpgYGB7cn0Kc3VtbWFyeShyZWdzdWJzZXRzX2ZvcndhcmQpJHdoaWNoWzgsIF0KYGBgCgpgYGB7cn0KIyB0ZXN0IGlmIHdlIHNob3VsZCBwdXQgcmVnaW9ucyBpbgptb2RfdHlwZV95ZWFyIDwtIGxtKGF2ZXJhZ2VfcHJpY2UgfiB0eXBlICsgeWVhciwgZGF0YSA9IGF2b2NhZG9fdHJpbSkKbW9kX3R5cGVfcmVnaW9uIDwtIGxtKGF2ZXJhZ2VfcHJpY2UgfiB0eXBlICsgeWVhciArIHJlZ2lvbiwgZGF0YSA9IGF2b2NhZG9fdHJpbSkKYW5vdmEobW9kX3R5cGVfeWVhciwgbW9kX3R5cGVfcmVnaW9uKQpgYGAKCmBgYHtyfQojIHRlc3QgaWYgd2Ugc2hvdWxkIHB1dCB5ZWFyIGluCm1vZF90eXBlX3llYXIgPC0gbG0oYXZlcmFnZV9wcmljZSB+IHR5cGUgKyB5ZWFyLCBkYXRhID0gYXZvY2Fkb190cmltKQptb2RfdHlwZV9xdWFydGVyIDwtIGxtKGF2ZXJhZ2VfcHJpY2UgfiB0eXBlICsgeWVhciArIHF1YXJ0ZXIsIGRhdGEgPSBhdm9jYWRvX3RyaW0pCmFub3ZhKG1vZF90eXBlX3llYXIsIG1vZF90eXBlX3F1YXJ0ZXIpCmBgYAoKYGBge3J9CiMgbm93IGxldCdzIHRlc3QgaWYgdGhlIG9uZSB3aXRoIHJlZ2lvbiBhbmQgcXVhcnRlciBpcyBkaWZmZXJlbnQgdGhhbiB0aGUgb25lIHdpdGgganVzdCByZWdpb24KCm1vZF90eXBlX3JlZ2lvbl9xdWFydGVyIDwtIGxtKGF2ZXJhZ2VfcHJpY2UgfiB0eXBlICsgeWVhciArIHJlZ2lvbiArIHF1YXJ0ZXIsIGRhdGEgPSBhdm9jYWRvX3RyaW0pCmFub3ZhKG1vZF90eXBlX3JlZ2lvbl9xdWFydGVyLCBtb2RfdHlwZV9yZWdpb24pCmBgYAoKCgo=